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SIMULTANEOUS ESTIMATION OF THE STATE AND NOISE 


STATISTICS IN LINEAR DYNAMICAL SYSTEMS* 


By Paul D. Abramson, Jr. 
Electronics Research Center 


SUMMARY 


An optimal procedure for estimating the state of a 
linear dynamical system when the statistics of the measure- 
ment and process noise are poorly known is developed. The 
criterion of maximum likelihood is used to obtain an optimal 
estimate of the state and noise statistics. These estimates 
are shown to be asymptotically unbiased, efficient, and 
unique, with the estimation error normally distributed with 
a known covariance. The resulting equations for the 
estimates cannot be solved recursively, but an iterative 
procedure for their solution is presented. Several approxi- 
mate solutions are presented which reduce the necessary 
computations in finding the estimates. Some of the approxi- 
mate solutions allow a real time estimation of the state 
and noise statistics. 

Closely related to the estimation problem is the 
subject of hypothesis testing. Several criteria are 
developed for testing hypotheses concerning the values of 
the noise statistics that are used in the computation of 
the appropriate filter gains in a linear Kalman type state 
estimator. If the observed measurements are not consis- 
tent with the assumptions about the noise statistics, then 
estimation of the noise statistics should be undertaken 
using either optimal or suboptimal procedures. 

Numerical results of a digital computer simulation of 
the optimal and suboptimal solutions of the estimation 
problem are presented for a simple but realistic example. 


♦Submitted to the Department of Aeronautics and 
Astronautics, Massachusetts Institute of Technology, 
on May 10, 1968, in partial fulfillment of the 
requirements for the degree of Doctor of Science. 
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Chapter 1 
INTRODUCTION 



1.1 Statement and Discuss ion of the Problem 

Optimal estimation has received considerable attention 
in recent years in fields such as space navigation, statis- 
tical communication theory, and many others that often 
require the estimation of certain variables that are either 
not directly measurable or are being measured with instru- 
ments that are not sufficiently accurate for an adequate 
deterministic solution. In essence the procedures aim at 
reducing the effects of random disturbances associated with 
these "imperfect 1 instruments. 

In many situations, the estimation procedure consists 
of no more than averaging repeated measurements of the 
"same" quantity made with the same or different instruments. 
In this way, the random errors made in each measurement 
might "average out," resulting in a higher confidence in the 
value of the quantity being measured than would be the case 
if only a single measurement was taken. In this type of 
operation, the improved confidence in the estimate depends 
upon the fact that the "same" quantity being measured is 
truly time invariant. 

In more complex situations, the quantity being measured 
might change from one measurement time to another. Suppose 
it is known that the voltage across an electrical network 
decreases exponentially with time. A simple average of 
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repeated measurements of the voltage made at different times 
would lead to an erroneous estimate. However, if the time 
constant associated with the exponential decay is known, then 
each measured voltage can be related to the voltage at any 
specified time. These computed voltages can then be averaged 
to obtain an estimate of the voltage at the specified time. 

The examples illustrated above represent the most simple 
case of estimation in which each measurement carries the same 
weight so that simple linear averaging of the measurements is 
performed to obtain the estimate. However, if each measure- 
ment has associated with it a different confidence, usually 
characterized by the variance of the measurement error, then 
a more complicated estimation scheme must be employed which 
takes into account the differing accuracies of the measure- 
ments. Typical examples of this situation are: 1) when two 

or more different types of instruments are used to measure 
the same quantity, or 2) in the case of the previous example 
when there is some random characteristic in the exponential 
function of the voltage being measured. This leads to a 
reduction in the confidence in relating measurements made at 
some time distant from the specified time. 

Operational or computational procedures involving a 
consideration of the variances of the various noises in the 
problem represent the first degree of sophistication in 
estimation. Various formulations have been advanced which 
characterize the statistical nature of the problem in some 
orderly pattern. There are two widely used techniques for 
optimal estimation when the time variation of the quantity 
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being measured can be described by a linear differential 
equation and when the measurements are linearly related to 
the quantity being estimated. The initial significant work 
on this problem was by Wiener (Ref. 36) who developed the 
condition to be satisfied for optimal estimation in the least 
mean-squared-error sense. This condition is generally 
referred to as the Wiener-Hopf integral equation. He also 
developed the solution for the case of a time invariant sys- 
tem with stationary noise processes. This work and further 
extensions and modifications by others are known as Wiener 
filters . 

In the Wiener filter, the measurement information is 
acknowledged to have a signal and a noise component. The 
filter, which is usually implemented as a linear analog 
filter, is designed so that the noise component of the 
measurement is more heavily attenuated than the signal com- 
ponent, thus allowing extraction of as much information from 
the measurement as is possible. However, non-time stationary, 
transient, or multiple input-output problems are difficult 
to solve by the Wiener approach. 

Kalman (Ref. 16) treated the estimation problem from a 
different point of view and formulated the equivalent of the 
Wiener-Hopf integral equation as a vector-matrix differential 
equation in state space. He developed the solution for a 
linear system with normally distributed noises as a set of 
vector-matrix difference equations which are commonly termed 
the "Kalman filter." Information about the dynamics of the 
process being measured, statistics of the disturbances 
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involved, and a priori knowledge of the quantities being 
estimated are included in the formulation of the problem. 

In the Kalman filter, the estimation proceeds from any 
chosen starting time and is well suited for situations 
dominated by a transient mode, such as the launching of a 
space vehicle. In the steady state, the Kalman filter can 
be shown to be equivalent to a Wiener filter and thus can be 
considered as a more general formulation of the estimation 
problem. Further advantages of the Kalman filter are that 
the computations are performed recursively, in the time 
domain, and are readily applicable to nonstationary and 
multiple input-output systems. In the standard formulation 
of the Kalman estimation procedure, allowance is made for a 
variation of the noise variances with respect to time. 

However, this knowledge is assumed to be known prior to the 
actual filter operation. In an operational situation, the 
time varying filter gains can be precomputed and stored in 
the filter to be used in conjunction with the measurement 
information to obtain the optimal estimate. As an estima- 
tion procedure of the first degree of sophistication, i.e., 
with the consideration of the noise variances, this is indeed 
a very powerful and generally applicable procedure. 

Kalman filtering can be thought of as a method of com- 
bining in an optimal fashion all information up to and includ- 
ing the latest measurement to provide an estimate at that 
time. The proper weighting to apply to the new measurement 
is determined by the relative "quality" of the new information 
as compared to the information contained in the estimate 
before the latest measurement. Poor measurements will receive 
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less weight than good ones. If there is noise driving the 
system between measurement times, the filter will weight the 
extrapolated value of the old estimate less than if there 
were no noise. This is because noise introduces an uncer- 
tainty in the state of the system between measurement times. 
Consequently the estimate will depend less upon old estimates 
and more upon new measurements. The appropriate measures of 
the "quality" of the old estimate and the new measurement 
are respectively the covariance of the old estimation error 
and the covariance of the new measurement error. 

These important points can be clarified by considering 
the following simple example. Let x n represent the scalar 
state of a system at time "n." If the system can be described 
by a linear differential equation, then the state at time "n" 
can be related to the state at time "n-1" by the difference 
equation 


x = $ (n,n-l) 
n 


x . + r w 
n-l n n 


5>(n,n-l) is the state transition matrix and extrapolates 

the state from time n-l to time n if the effects of w are 

n 

ignored. T n is the "forcing function matrix" and w n is the 
state "driving noise" which is assumed to be a zero mean 
uncorrelated normally distributed noise with variance Q n „ 

/N 

Let x i , represent the estimate of x obtained after 
n | n-l n 

processing n-l measurements and let p n | n _i represent the 
variance of the estimation error after n-l measurements. 
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The measurement at time n is given by 


n 


= H x + v 
n n n 


where v is additive noise representing the error in the 
n 

measurement and H is the "observation matrix" which relates 

n 

the measurement to the state. In this example, z n is a scalar 

and H = 1. It is assumed that v is a zero mean uncorre- 
n n 

lated normally distributed noise with variance R . 

The scalar Kalman filter equation for incorporating 
this new measurement into the state estimate is given by 


R 


n 


n n 


n 


+ P 


n I n-1 


x | , + 

n n-1 


' n n-1 


n 


+ P 


n I n-1 


n 


The variance of the estimation error after incorporation of 
the new measurement is given by 


R 


n 


' n n 


n 


+ P 


n I n-1 


n n-1 


A 

If the state estimate x n | n-1 is very good compared with 
the information contained in z , then 


and thus 


and 


P 


n n-1 


n n 


P | 
n n 


x n | n-1 
P n|n-1 
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In this case, the measurement datum is effectively rejected 
because it is so noisy that it is virtually useless. Since 
no new information has been added, the variance of the esti- 
mation error remains the same after the measurement. 

A 

In the other extreme case, suppose x n | n _^ is of very 
poor quality compared with the information contained in z . 

Then : 


Pi , >> 

n n-1 


n 


and thus x i - z 

n | n n 


and 


n n 


R 


n 


In this case, the estimate x n | n _]_ is effectively rejected and 

A ' 

the estimate x , is based upon the single measurement z . 

n | n 3 n 

In all cases falling between these two extremes, the estimate 

A A 

x | is a linear combination of the old estimate x i .. and 
n | n n | n-l 

the new measurement z . 

n 

Before computing the proper weighting factors given 
above, the variance of the state estimation error before the 
measurement at time n must be found. This can be done by 
studying how the actual state changes between time n-l and 
time n and how the state estimate changes in this same time 

A 

interval. Let x .. . , be the estimate of the state x , 

n-l | n-l n-l 

after the measurement at time n-l. Since w is a zero mean 

n 

independent random variable, the best estimate of the state 
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at time n based upon the n-1 measurements is given by 


x n|n-l “ t(n ' n - 1) x n-l | n-1 


If P n _^| n _2 represents the covariance of the estimation 
error at time n-1, it can be seen that 


P | i = $ 2 (n,n-l) P , | , + T 2 Q 

n n-1 n-1 n-1 n n 


A large driving noise variance will cause a large increase 
in the mean squared error in the estimate when it is extra- 
polated from one measurement time to the next. 

The filter equations given above are for the case of 
a scalar state and measurement. In Chapter 2, the more 
general case of a vector state and measurement is treated. 
However, even in more complicated situations, the same inter- 
pretation can be applied to the operation of the filter. 

The primary purpose of the filter is to compute and apply 
the proper weighting factors so that the new measurement 
information can be incorporated with an old estimate of the 
state to provide a combined and improved state estimate. 

Precise knowledge of the measurement and driving noise 
statistics is of fundamental importance in the operation 
of a Kalman filter. However, in any operational situation, 
the statistics of the noises that are used in the filter 
are in fact only estimates or predictions of the statistics 
of the noises that will actually be encountered. In some 
cases these estimates might be quite accurate, but in other 
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cases they may be sufficiently in error to adversely affect 
the filter. One effect of this can be a large discrepency 
between the state estimation error covariance matrix as com- 
puted within the filter and the "actual" state estimation 
error covariance. If there is a difference between the com- 
puted and actual covariance of the old state estimate, the 
filter can make an error in computing the weighting for a 
new measurement. This subject is treated fully in Chapter 2 
but it can be understood by considering the following 
example. 

Suppose that it is assumed that there is no noise 
driving the state when in fact driving noise is present. 

Then the computed covariance of the state estimation error 
will generally be smaller than the actual estimation error 
covariance. This is because the driving noise introduces 
an error in extrapolating the state estimate from one measure- 
ment time to the next which is not accounted for in the 
computed state estimation error covariance matrix. The filter 
"thinks" it is doing a better 30b of estimating the state 
than is actually the case. If the fiLtec thinks the old state 
estimate is much better than it actually is c it may assign 
little weight to new measurement information and thus 
effectively discard this new information,, Of course, this 
is exactly the wrong thing to do„ The old state estimate may 
be of very poor quality so that the new measurement informa- 
tion should be weighted quite heavily. However, in its igno- 
rance, the filter fails to do this and as a result the actual 
estimation error may become very large while the filter 
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"thinks" it is doing a good job of estimating the state. 

A similar problem can arise in the case of vector mea- 
surements. If the relative quality of the different measure- 
ments is not well known, then more weight might be given to 
a measurement taken with an inaccurate instrument than to a 
very accurate one. This would lead to a greater estimation 
error than would be the case if the relative accuracy of the 
different measurements was known and the proper weighting 
assigned to each. 

A priori estimates of the statistics of the noises can 
be obtained in several ways . They may be no more than 
educated guesses as to what noise environment may actually 
exist. It is often very difficult to predict with accuracy 
the operating conditions of a complicated and interrelated 
system, especially in research and development applications 
when little may be known before an experiment is conducted. 

Another technique for obtaining the statistics of the 
noises is the analysis of previous experiments. These 
experiments may have been conducted in an operational envi- 
ronment or in the controlled environment of a laboratory. 

In either case, it is rarely possible to have complete 
confidence in the estimates of the noise statistics due to 
the necessarily finite number of experiments that can be 
performed and possible problems associated with the inability 
te> isolate and distinguish the various effects of the different 
noises. And there is still a question as to whether the envi- 
ronment will remain constant between the time these estimates 
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of the statistics are obtained and when the estimates are 


subsequently used in the Kalman filter. 

Thus in many situations, the assumption that the a priori 
estimates of the statistics of the measurement and driving 
noises are good estimates may not be justified. The primary 
objective of this work is to develop an optimal estimator of 
the state that remains optimal when the statistics of these 
noises are not precisely known a priori. In the process of 
estimating the state under these conditions, optimal estimates 
of the measurement and driving noise statistics are also 
obtained . 

In developing optimal estimators for the state and noise 
statistics, it is not assumed that the statistics of the 
noises are known precisely a priori. Instead, it is assumed 
that the uncertainty in knowledge of these statistics has a 
particular distribution about some a priori value. This is 
completely analogous to the usual assumption made in Kalman 
filtering that the initial state of the system is not known 
precisely, but rather the uncertainty in knowledge of the state 
can be described by a suitable probability density function. 

In both cases, it is assumed that the distribution of the 
uncertainty is known a priori. This represents the second 
degree of sophistication is estimation procedures. It 
reduces by one level the necessary specif ication . of the values 
of the noise statistics. Instead of having to specify their 
exact values, all that need be specified is the possible dis- 
tribution these values might have,, In fact, it will subse- 
quently be shown that the exact shape of this distribution is 
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relatively unimportant when a large number of measurements 
has been taken. 

The above discussion can be clarified by considering the 
following simple example. It will be shown' that the a priori 
estimates of the noise statistics can be improved at the same 
time that the state is being estimated. All measurements 
contain some information about the noises as well as the 
state, whether these measurements are taken in the laboratory 
or in an operational environment. So a procedure can be 
devised to utilize this information about the noises actually 
encountered to improve our knowledge of the noise statistics. 

Suppose the state that is to be estimated is a time 
invariant scalar and the measurements of the state are given 
by 


where x is the constant state and v n is a zero mean indepen- 
dent normally distributed measurement noise with time invari- 
ant variance R. If a single measurement is taken, the optimal 
estimate of the state x is given by 

x ljl = Z 1 

and the variance of the state estimation error is given by 


P 


1 1 1 


R 
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If repeated measurements are performed, it is easy to show 

th 

that the optimal estimate of the state after the n measure- 
ment is given by 




X 


n | n 


n 



j = l 


and the variance of the state estimation error is 




Thus increasing the number of measurements decreases the 
variance of the estimation error by the factor (1/n) . Note 
that in this simple example, the measurement noise variance 

is not needed to define the optimal estimate of the state „ 

* 

This is a consequence of the fact that if the actual measure- 
ment noise variance is assumed to be time invariant, and if 
there is no a priori information about the state, then all 
measurements are given the same weight, regardless of the 
actual value of R. However, the variance of the state 
estimation does depend upon the actual value of R as given 
above. In more complicated situations, such as vector mea- 
surements or when there is noise driving the state, the 
optimal state estimate does depend upon the relative sizes of 
the noise covariances involved. But in this case, only the 
variance of the state estimation error depends upon R. 

If the value of R is unknown, its value can be estimated 
from the measurements themselves. In the above case, when the 
true state is time invariant, an estimate of R can be defined 


by 
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n 


R a i y 

n n-1 


j=l 


( z . - X | ) 

J n | n 


It is easy to show that such an estimate is an unbiased 

✓ 

estimate of the noise variance. The expected value of R, 
is given by 


n 


n 


e (R. 


• ) = -K- Y e [ (z . - x | ) 2 ] 
n n-1 {_ j n|n 

j=l 


where e ( ) represents an average over the ensemble of all 
possible measurement noises with covariance R. It can be 
seen that 


z . - x = v . 
3 n n j 


v 

x 


n n 


where 


v 

X I = X I - X 

n n n n 


n 


and 


So (z j 


= i y 

n I n n Z_. 


v. 


k=l 


n n 


n 


-x i ) 2 = v^+ — 2 y Y v,v -1 V v. 

n|n jn Z _ , L-, ksn^_, j 


v, 


k=l s=l 


k=l 


and - ;„| n ) 2 ] = R + i R - § R - R 


In obtaining the above expression, use was made of the indepen- 
dence of the measurement noises at different times. Then 


e (R ) 
n 


n 




R) 


R 
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It can be shown that the variance of the R estimation 
error is given by 




e[(R - R) ] = 


n 


2 R 
n-1 


Thus as the number of measurements increases, the variance of 
the noise variance estimation error becomes small and R n 
becomes an arbitrarily good estimate of the actual measurement 
noise variance. 

With an estimate of R, an estimate of the state estima- 
tion error variance can be obtained. 

✓s, -| /s 

P I = — R 
n | n n n 

As was mentioned before, in most cases some estimate of the 
measurement noise variance is available before the above 
measurements are taken. Suppose an estimate of R is obtained 
from a series of measurements and it differs from an a priori 
value obtained by some other means. Now the question is which 
value more accurately represents the variance of the measure- 
ment noise, the a priori value or the value obtained from the 
measurements. The concepts of relative weighting discussed 
in connection with Kalman state estimation offer a solution 
to this problem. 

There is usually some measure of accuracy associated 
with the a priori estimate of R. This measure is often the 
variance of possible deviations of the actual value of R 
about the a priori estimate. If it is felt that the a priori 
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estimate is highly accurate, the variance about the true 
value would be small. Conversely, if it is felt that the 
a priori estimate of R is highly inaccurate, the variance 
about the true value of R would be large. 

A combined estimate of the measurement noise variance 
can by defined by 



where R is the a priori estimate of R, R^ is the estimate 
o n 

2 

obtained from the measurements, a is the variance of the 

° 2 

true value of R about the a priori estimate, and a R is the 

~ n 2 

variance of the true value about the estimate R n « c R is 

n 

given by 

°r - ’ R)2] - 

n 

2 

In order to compute a R , the true value of R must be known. 

n 

However, for moderately large n, the approximation can be made 


2 R 


n 


n 


n-1 


By analogy to the state estimation problem, a measure 
of the variance of the combined estimate of the measurement 
noise variance is given by 
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A 

If the a priori estimate R q is of high accuracy compared 

A 

with R n , then 


and thus 





/A 



and 


a R C 

n 



If the a priori estimate is of low accuracy compared 

A 

with R , then 

n' 


and thus 





s\ 



and 


a R C 

n 





In all cases falling between these two extremes, the 

estimate R c is a linear combination of the a priori estimate 
n 

and the estimate obtained from the measurements. 

Of course, the situation is not always as simple as in 
the previous example. The state may be a time-varying vector 
with additive driving noise. The measurements may be vectors 
indicating that several measurement devices of possibly 
differing accuracies are used to measure the state at any time. 
In such cases, the problem is simultaneously estimating the 
state and the noise covariances becomes much more complicated. 
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The resulting equations for optimal estimates of the state 
and noise statistics are generally coupled nonlinear equations 
that must be solved by some numerical procedure. But the 
essence of the problem is the same. From the information con- 
tained in the measurements taken in an operational environ- 
ment, improvements can be made in the estimates of not only 
the state but also the statistics of the measurement and 
driving noises. The performance of the state estimator in 
such a situation can be improved compared with the estimator 
that uses incorrect values of the noise statistics in com- 
puting the appropriate filter gains. 

1 . 2 Historical Background 

Optimal state estimation when the statistics of the 
measurement and driving noises are poorly known is but one 
class of problems within the more general area of state esti- 
mation in the presence of "modeling errors." In the formula- 
tion of the Kalman filter, it is assumed that the dynamics of 
the system can be accurately modeled as a set of linear 
differential or difference equations with precisely known 
coefficients. This is reflected in the value of the state 
transition matrix that is used to extrapolate the state 
estimate from one measurement time to the next. In fact, 
the modeling of the system might involve approximations. 

The number of state variables that are necessary to accurate- 
ly model the system might be so great that the number of 
computations needed to estimate all of the variables becomes 
prohibitively large. Often the number of computations can be 
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. w-u uy including only the most significant state vari- 
ables in the filter model. This will reduce the complexity 
of the filter but can introduce additional errors in the 
estimation of the reduced number of state variables. 

It may not be possible to model the system dynamics by 
any set, no matter how large, of linear differential equa- 
tions. The motion of the state might be described by a set 
of nonlinear differential equations which can only be 
approximated by a set of linear differential equations 
describing the motion of the system about some nominal path. 
This too can introduce errors in the state estimation that 
are not accounted for within the model. 

There are other sources of modeling error. The elements 
of the state transition matrices used within the filter may 
not be accurately known. The actual measurements may be a 
nonlinear function of the state although it was assumed in the 
derivation of the filter equations that the measurements are 
a linear function of the state. These nonlinearities may not 
be highly significant but they can cause additional state 
estimation errors. 

All of these "modeling errors," including inaccurately 
known noise statistics, can result in a degradation of the 
Kalman filter performance. 

Many authors have studied the problem of optimal esti- 
lation and control of a linear plant whose parameters may 
ot be accurately known. A comprehensive list of references 
i this subject would be prohibitively long. For this 
■ason, the only works cited here are those that have some 
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bearing on the problem of optimal state estimation in the 
presence of modeling errors. 

Spang (Ref. 34) has studied the problem of optimal 
control of a linear plant with unknown coefficients under 
the assumptions that there is no measurement noise and the 
statistics of the noise driving the state are precisely 
known. He also assumes that the uncertainty in knowledge 
of the coefficients describing the plant have some distribu- 
tion of values that can be represented by a probability 
density function of coefficient values. The optimal control 
signal which minimizes a quadratic error measure is obtained 
by finding the conditional mean of the system tracking error, 
conditioned upon the actual measurements of the system but 
averaged over the distribution of all possible plant 
coefficient values. In this way, the error is minimized 
over the ensemble of all possible trials with systems whose 
parameters vary in a fashion described by the assigned proba- 
bility density function. No attempt is made to estimate the 
actual plant coefficients. Although Spang is concerned 
primarily with optimal control, several of the concepts he 
develops have direct application to optimal state estimation 
when the parameters of the system are unknown. 

Drenick (Ref. 8) has also studied this problem. He 
also assumes that the uncertainty in the parameters of a 
linear plant can be described by a probability density 
function whose first two moments are known. His optimal 
control signal minimizes the conditional mean squared 
tracking error and is a function of the measurements on the 
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system and the first two moments of' the parameter distribu- 
tions. However, using his procedure, there is no way to 
estimate the values of the unknown system parameters except 
in a very restricted set of problems. 

Magill (Ref. 21) takes an interesting and rather unique 
approach to the problem of optimal state estimation when 
certain statistical parameters of the problem are unknown. 
These parameters, called the parameter vector, are assumed 
to come from a finite set of values that are known a priori. 
The optimal estimator is composed of a set of Kalman type 
state estimators, with each filter using one of the finite 
number of parameter vectors to compute the proper measure- 
ment gains. The outputs of the filters are weighted and 
added, with the weighting of each filter output being deter- 
mined by the conditional probability that the parameter 
vector being used in that filter is the true parameter vector. 
These conditional probabilities are functions of the mea- 
surements and are obtained by relatively simple but nonlinear 
calculations . 

The following works are primarily concerned with 
obtaining relatively simple and easy-to-use procedures rather 
than finding an "optimal" solution to the problem. The 
approaches to the problem are quite different but there is 
one common feature. This feature is the real time examina- 
tion of measurement residuals to determine if a Kalman type 
state estimator is performing as predicted. The measurement 
residual is defined as the difference between an actual 
measurement and the predicted measurement, this prediction 
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being based upon the predicted state at the time of the 
measurement. If the measurement at time n is given by 


n 


= H x 
n n 


+ v 


n 


and x i n is the estimate of the state x before the 
n|n-l n 

measurement z^, then the measurement residual is defined by 


Az 

n 


z 


n 


H x | , 

n n n-1 


If there are no modeling errors, it is easy to show that 

Az is a zero mean random variable with covariance 
n 


4z n> ‘ R n + H „ P n^ 


n-1 


H 


T 

'n 


where R is the covariance of the measurement noise v , and 
n n' 

P . -.is the covariance of the state estimation error before 
n I n-1 

the nth measurement. 

Jazwinski (Ref. 15) has suggested introducing into the 
model of the dynamics of the system a zero mean random driving 
noise which in some sense can account for the effect of any 
modeling error. However, the covariance of this noise is 
not known a priori since it is not known what modeling 
errors are actually present. Jazwinski proposes a simple 
and reportedly effective procedure for determining how much 
"driving noise" to introduce into the model based upon an 
examination of a single residual at a time. If the squared 
residual is much larger than predicted by the filter, the 
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computed covariance of the old state estimate is artificially 
increased at that time so that the new measurement is 
weighted more heavily than would be the case if no adjustment 
was made. In this way possible divergence problems in the 
filter are minimized because as soon as the residuals become 
large, indicating that there is an error in the model, the 
measurements are weighted heavily. This tends to reduce the 
estimation error to a level consistent with that predicted 
by the filter. 

No attempt is made to estimate the value of the 
covariance of the added driving noise since in fact it does 
not exist. It was included to account for any unknown 
modeling errors. Even if the covariance is estimated, 
such an estimate would have little statistical significance 
since it would be based upon an examination of a single 
measurement residual. So Jazwinski's procedure should be 
viewed as an attempt to reduce the effect of modeling errors 
on the filter operation rather than an attempt to improve 
our knowledge of the model. 

Dennis (Ref. 5) addresses himself to a more compli- 
cated problem, that of estimating the effects of errors in 
modeling the dynamics of the system as well as estimating 
the covariances of the measurement and driving noises. Only 
his procedure for estimating the statistics of the noises is 
of interest here. 

Dennis develops expressions for a real time estimator 
of the measurement and driving noise covariances. The 
estimates are subsequently used in the computation of the 


appropriate weighting gains in a Kalman state estimator. 
Dennis' solution for the estimation of the noise statistics 
is suboptimal in the sense that no optimality criterion is 
used in defining these estimates. The expressions were 
obtained by an examination of the characteristics of quadratic 
functions of certain measurement residuals. From this 
examination a reasonable, if not optimal, estimator is 
postulated. However, in many useful applications there are 
several problems associated with the use of this estimator. 

It is not always possible to estimate all of the unknown 
elements of the measurement and driving noise covariance 
matrices. Depending upon the dimension and nature of the 
measurement, some or all of the elements of the driving 
noise covariance may not be observable, and as a result, a 
singular situation is created. There are also certain 
situations when the estimators may be biased and result in 
estimates that do not converge to the true values of the 
noise covariances as the number of measurements becomes 
large. Dennis does not develop expressions for the evalu- 
ation of the quality of the noise covariance estimates. Such 
measures of quality would be needed if it is desired to 
incorporate the estimates obtained from the measurements 
with some a priori estimates to obtain a combined estimate 
based upon a priori knowledge of the noise covariances and 
the information contained in the measurements. 

Shellenbarger (Ref. 31) is exclusively concerned with 
estimating the values of the measurement and driving noise 
covariances so that the proper gains can be computed for 
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estimating the state. His technique is aimed at finding an 
approximate solution to this problem and consequently his 
estimator for these parameters is suboptimal. He bases his 
estimates of the noise covariance parameters upon an examina- 
tion of a single measurement residual at a time. If the 
measurement is of small dimension compared with the number 
of covariance parameters being estimated, there is no 
unique solution for all of the noise covariance parameters. 

In addition to this, there is also a question of a possible 
bias in the noise covariance estimator., 

The work of Smith (Ref. 33) is even more restricted in 
that he attempts to estimate only the measurement noise 
covariance, assuming that the dynamical model of the state 
and the covariance of the driving, noise are known precisely. 

His work results in a suboptimal estimator for the state 
and measurement noise covariance. Here too there is a 
question of a possible bias in the noise covariance estimator. 

Because of the relevance of noise covariance estimation 
to this work, a short review of the procedures of Dennis, 
Shellenbarger , and Smith is included in Chapter 4„ Although 
their procedures are suboptimal and there are problems asso- 
ciated with implementing their estimators in certain cases, 
it is felt that there are some situations when these estimators 
provide an adequate solution to the problem of inaccurately 
known noise statistics. Their procedures are much simpler 
that the optimal procedures developed in Chapter 3 and provide 
some insight into the variety of techniques that are available 
for an approximate solution to the problem. 
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1. 3 Summary of Thesis 

As was previously mentioned, the primary objective of 
this thesis is the development of an optimal estimator of 
the state and statistics of the measurement and driving 
noises. However, several other related subjects are also 
treated. 

In Chapter 2, it will be shown that a biased or corre- 
lated measurement or driving noise can be estimated using 
a linear recursive filter identical in form to the usual 
Kalman filter for estimating the state. This is a conse- 
quence of the fact that such a biased or correlated noise is 
observable in terms of a linear function of the measurements. 

It will also be shown that an error in the values of the 
measurement and driving noise covariances used to compute 
Kalman filter gains does not produce an observable effect 
in a linear function of the measurements. Therefore, any 
estimator of these covariances is inherently a nonlinear 
estimator since a nonlinear function of the measurement is 
needed in the estimation loop. In the simple example given 
previously, it was shown that an estimator for the measure- 
ment noise variance is a quadratic function of the measurements. 

Initially an attempt was made to formulate the problem 
of noise covariance estimation in terms of minimum variance 
estimation, but the nonlinearities in the problem immediately 
produced great analytical difficulties . This is one of the 
reasons why the criterion of maximum likelihood was chosen 
to define the optimal estimates of the state and the noise 
statistics. As the name might imply, maximum likelihood 
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estimates are the most probable values of the state and 
statistics for a given set of measurements. The techniques 
of maximum likelihood result in complicated equations, but 
the theory of maximum likelihood estimators is sufficiently 
developed to allow a proper handling of the problem. 

The points of maximum likelihood are found by setting 
the derivatives of a suitable likelihood function to zero 
and then solving the resulting equations for the unknown 
parameters. There is a likelihood equation associated with 
each parameter being estimated. When the noise covariance 
matrices are assumed to be time invariant, the solution of 
the likelihood equations for the optimal state estimate is 
just a Kalman type estimator which uses the optimal estimates 
of the noise covariances to compute the appropriate filter 
gains. Unfortunately, there is no general closed form 
solution of the likelihood equations for these optimal noise 
covariance estimates. However, an iterative procedure is 
proposed for the solution of the likelihood equations corre- 
sponding to the estimates of the noise covariances. These 
estimates are shown to be asymptotically unbiased, efficient, 
consistent, and unique, with the estimation error normally 
distributed with a known covariance. 

In addition to the optimal solution discussed in 
Chapter 3, several suboptimal solutions of the problem are 
given in Chapter 4. These solutions can result in a major 
savings in the computational requirements but they do not 
have the wide range of applicability of the optimal solution. 


27 



uni iii 

Chapter 5 is devoted to a discussion of hypothesis testing. 
Hypothesis testing is closely related to the estimation problem. 
Certain criteria are developed for making decisions as to 
whether observed measurements are consistent with assumptions 
about the statistics of the measurement and driving noises. 
However, the tests themselves do not allow a determination of 
the reasons the measurements fail a particular hypothesis 
test, but rather indicate that there is some error in the 
model of the system and/or measurement. The tests can 
usually be conducted at less computational expense than a 
more complicated noise covariance estimation procedure, so 
they can be used to determine if such additional estimation 
should be conducted. 

In Chapter 6, the numerical results of a computer simu- 
lation of the theoretical results are presented. The optimal 
and suboptimal estimators are simulated to study their perfor- 
mance in a simple but realistic situation. The techniques 
of hypothesis testing are also studied to find the power of 
certain tests in detecting errors in the values of the noise 
statistics used within a Kalman filter. 
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Chapter 2 

EXPECTATION OPERATORS AND 
MAXIMUM LIKELIHOOD ESTIMATION 

2 . 1 Introduction 

In this chapter two types of expectation operators are 
defined and maximum likelihood parameter estimation discussed. 
A precise understanding of the expectation operator notation 
is necessary for subsequent work, so important definitions 
and results are given here. The maximum likelihood equations 
are utilized to establish the notation and results of the 
familiar linear state estimation problem with and without 
the use of a priori information about the state. The question 
of unbiasedness and the covariance of the state estimate in 
the presence of inaccurately known noise statistics is also 
discussed. More general parameter estimation problems and a 
more detailed examination of the properties of maximum likeli- 
hood estimators are treated in Chapter 3. 

2.2 Conditional and Unconditional Exp ectation Operators 

Let x and y be random variables (possibly vector valued) 
with joint probability density function f(x,y) defined over 
the range < x,y < The conditional expectation, or mean, 

of x, conditioned upon the value of y is defined by 

r oo 

e(x|y) = / x f(x|y) dx (2.2.1) 

J — oo 
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whese f(x|y) is the conditional probability density function 
of x given y- Define 


f (x) 


--L 


f(x,y) dy 


f (y) 


A £ 


f(x,y) dx 


Applying Bayes' rule. 


f U|y) = 


( 2 . 2 . 2 ) 


The unconditional expectation of x is defined by 


4 £ 


E (x) = / x f (x) dx 

-00 


■ /.: -[/. 
■ /: ■[/. 


f (x,y) dy 


dx 


f (x | y) f (y) dy 


dx 


■ £ [£ 


x f (x | y) dx 


f (y) dy 


“£ 


e(x|y) f (y) dy 


( 2 . 2 . 3 ) 


The first expectation, e(x|y), is the expected value of 
x if y were fixed at the conditioned value. It is found by 
averaging over all other random influences with a constant 
value of y. The second expectation, E (x) , is the expected 
value of x which represents an average over the distribution 
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of y as well as over all other random influences. 

The conditional covariance of x is defined by 


cov (x | y) = e ( (x - e(x|y))(x - e(x|y)) T |y) 

= e (x x T |y) - e (x | y ) e(x T |y) (2.2.4) 

The unconditional covariance of x is defined by 
cov(x) = E ( (x - E (x) ) (x - E(x)) T ) 

= E(x x T ) - E (x) E(x T ) (2.2.5) 

But E(cov(x|y)) = E(x x T ) - E ( e (x | y ) e (x T | y ) ) 

= cov(x) + E (x) E(x T ) - E (e (x | y) e (x T | y) ) 
and cov(e(x|y)) = E (e (x | y) e (x T | y) ) - E (x) E (x T ) 
so cov(x) = E(cov(x|y)) + cov(e(x|y)) (2.2.6) 

Thus the unconditional covariance can always be decomposed 
into the sum of two components: 1) the average conditional 

covariance and 2) the covariance of the conditional average. 

The use of the conditional and unconditional expection 
operators in this work is somewhat unconventional because the 
random variables y may represent the parameters of the 
probability density function of x. It is not usual to think 
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of the parameters of a probability density function as them- 
selves being random variables. However, in situations where 
it is desired to estimate the values of these parameters on 
the basis of observed values of a random variable x, by con- 
sidering y to be a random variable any a priori information 
about the value of y can be utilized coherently in forming an 
a posteriori estimate of the value of y. It may not seem 
legitimate to regard the value of y as itself being the 
outcome of a random experiment. Usually it is more natural 
to regard y simply as a fixed, though unknown , constant which 
appears as a parameter in the x distribution from which sample 
values are taken. However, if this approach is used, there 
is no way to utilize a priori information about y and accord- 
ingly the performance of the estimator would be degraded. 

In the extreme case when no a priori information about y 
exists , then introduction of the concept of an initial 
distribution for y would be unjustified and of no practical 
use. In the other extreme case when it is assumed that the 
parameters are known precisely a priori, then the probability 
density function of y would reduce to impulses at the known 
values of the parameters. However, in such a situation, in 
the absence of any other random influences on y, there would 
be no need for the entire estimation process since it is 
assumed that the values of y are known. In all cases falling 
between these two extremes, by introduction of a realistic if 
not precisely correct density function for y, the realities 
of the situation can be more closely modeled than by consid- 
ering that the parameters y are either exactly known or 
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completely unknown a priori. 

The above discussion can be illustrated by a simple 
example. Let x by a normal variable with mean m and variance 
s, with conditional probability density function f(x|m,s). 
Furthermore let m and s be random variables with a joint 
probability density function f(m,s). For simplicity it is 
assumed that s and m are independent, so f(m,s) = f (m) f(s). 

The conditional mean of x is 


e (x | m 


' s> = L 


x f (x|m,s) dx 


But f(x|m,s) = e- 1/2[ * - m) /sl 


(2tts) 


1/2 


so e(x|m,s) = m independent of s 


The unconditional mean of x is 


E (x) 


oo 

= JJ e(x|m,s) f(m,s) dm ds 

— oo 

-i: 


m f (m) dm = m 


The conditional variance of x is 


e ( (x - m) I m 


-> -i: 


(x - m ) * f(x|m,s) dx = s 


The unconditional variance of x is 


E ( (x - m)^) = JJ e ( (x - m) 2 |m,s) f(m,s) dm ds 

— oo 
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(s + m 2 - m 2 ) f (m) f (s) dm ds 


— 2—2 

= s + m - m 


where 



s f (s) ds 


2 


m 


f (m) dm 


Note that E ( (x - 


m) 2 ) 


¥■ E ( (x - 


m) 2 ) 


unless m' 


-2 
= m , 


2 . 3 Maximum Likelihood State Estimation 

In this section the theory of maximum likelihood estima- 
tion is discussed and applied to the estimation of the state 
of a linear dynamical system which is driven by white noise 
and observed by linear noisy measurements. Because of the 
relative simplicity of the equations for determining the state 
estimate, much can be said about the performance of the 
estimator. In more complicated situations, such as estimating 
the covariance of the measurement and driving noises, 
evaluation of the estimator behavior is considerably more 
difficult and requires a more thorough analysis. For this 
reason the discussion of these situations is deferred until 
Chapter 3 . 

Maximum likelihood estimation, as the name might imply, 
is concerned with finding the maximum of a likelihood function 
defined as a function of the parameters being estimated and 
the measurements on the system. Let Z denote the realized 
values of a set of measurements and a T = (a 1 , a 2 ,..,a m ) be 
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the vector of parameters belonging to a set of all possible 
parameter values ft. Further, let f(z|a) denote the conditional 
probability density function of the measurements Z given the 
value of the parameter a. The likelihood function is then 
defined by 


1 (a, Z) = f (Z | a) (2.3.1) 


The principle of maximum likelihood consists of accepting 

~T ~1 ~2 ~m T 

a = (a , a , . . , a ) as the estimate of a , where 


1 ( a , Z ) = max 1 ( a , Z ) (2.3.2) 

a 


There may be a set of samples for which a does not exist. 
Under suitable regularity conditions on f(z|a), the frequency 


of such samples can be shown to be negligible. 


In practice it is convenient to work with the natural 


logarithm of l(a,Z), in which case a in (2.3.2) satisfies the 


equation 


L(a,Z) = In l(a,Z) = max L(a,Z) (2.3.3) 

a 

When the maximum in (2.3.3) is attained at an interior point 
of ft, and L(a,Z) is a differentiable function of a, then the 
partial derivatives vanish at that point, so that a is a 
solution of the equation 


3L (a, Z) ' 

3a 

/ a 


0 


(2.3.4) 
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Equation (2.3.4) is called the maximum likelihood equation 
and any solution of it a maximum likelihood estimate. The 

A 

function a defined by (2.3.3) over the sample space of 
observations Z is called a maximum likelihood estimator. 

If a priori information about the parameters being 
estimated exists and if the a priori uncertainty in knowledge 
of these parameters can be formulated as an a priori proba- 
bility density function for a, then a slightly different 
likelihood function can be defined so that this a priori 
information can be used in an optimal fashion. In such cases, 
the augmented likelihood function is defined by 

1 A ( a , Z ) = f (a| Z) (2.3.5) 


where f(a|z) is the conditional probability density function 
of the parameters a given the measurements Z . By application 
of Bayes ' rule it can be seen that 


f (a Z) 


f (Z a) f (a) 
f (Z) 


where f (a) is the a priori probability density function of a 
and f (Z) is the unconditional probability density function 
of Z, found by 


f (Z) 


/ 


f (Z a) f (a) 


da 


In this case the logarithm of the augmented likelihood 
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function (2.3.5) is 


L A (a,Z) = In l A (a,Z) = In f(z|a) + In f (a) 

and 8LA ( a ' Z ) = 3L(a, Z) + 3 In f(a) 

3 a 3a 3 a 

The inclusion of a priori information about a has a tendency 
to shift the zero points of (2.3.7) towards the peak of the 
a priori parameter density function. If a priori information 
about a exists, it is usually preferable to utilize the formu- 
lation f(a|z) since this allows utilization of all informa- 
tion about the value of a, both from the a priori information 
and information derived from the measurements Z„ However, it 
should be realized that if the assigned a priori probability 
density function of the parameters does not accurately repre- 
sent possible variations in the parameters, the performance 
of the estimator may in fact be degraded by inclusion of a 
priori information. When studying the performance of an 
estimator, there is some justification for looking first at 
an estimator which does not utilize a priori information. 

This allows determination of how effectively a given esti- 
mator extracts information from the measurements without 
considering how this estimate might be incorporated with an 
a priori estimate to obtain a combined estimate. 

In the derivation of the maximum likelihood state esti- 
mation equations, it is first assumed that a priori informa- 
tion about the state does exist so that the latter form of 
the likelihood function is employed. After the solution of 


- In f(Z) (2.3. 
(2.3.7) 
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this problem is obtained, the equations for estimating the 
state without a priori information will be given. 

Both solutions of the state estimation equations should 
more correctly be called conditional maximum likelihood 
estimates because the optimality of such estimates is condi- 
tioned upon the assumption that the noise driving the state 
and corrupting the measurements of the state have a known 
distribution with precisely known parameters. If this 
assumption is not valid, then the state estimates are no 
longer the true maximum likelihood estimates and all guaran- 
tees of optimality are lost. 

The purpose of this section is to establish certain 
results and notation which will be needed in later chapters. 
An excellent reference on the subject of maximum likelihood 
state estimation is by Rauch (Ref. 26) . 

Let the linear dynamical system being observed be 
defined by the recursive relationship 

x^ = $(k,k-l) x k _^ + r k w^ (3x1 vector) (2.3.8) 

and the linear noisy observations upon the system at time k 
be defined by 


z k = H k x k + v k ( yxl vector) 


(2.3.9) 
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where 6., = 1 if i = k and is zero otherwise. 

3k J 

The above conditional expectation operators are conditioned 

upon the assumed values of the means and covariances of the 

noises as well as their assumed independence. 

T T T 

Given the vector of n measurements Z = (z z ) and 

n 1 n 

an independent a priori estimate of the initial state, maxi- 
mum likelihood estimation of the state x is based upon 

n 

finding the particular value of the state which maximizes the 
conditional probability density function of the state, given 
all measurements of the state. Implicit in the definition of 
the likelihood function is that all values of R k and Q^, 
k = l,..,n, be known precisely, as well as the covariance of 
the a priori state distribution, the elements of the state 
transition matrices, the observation matrices, and the 
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forcing function matrices. To indicate this dependence of 
the likelihood function on these parameters, some of the para- 
meters will appear as conditioning variables in the condi- 
tional likelihood function. This choice of parameters to 
thus indicate is motivated by the work of Chapter 3, when the 
values of certain parameters are to be estimated. 

It is convenient to work with the natural logarithm of 
the likelihood function. 


L n ( x n / r R t Q ) — f ( x n I Z n ' R ' ^ 


(2.3.12) 


where R and Q represent the known sequence of values 
R l' “ * ' R n'®l' ' * '^n' the measurement and driving noise 
covariances . 

The conditional probability density function of the state 
is found by use of Bayes' rule. 

f (x , Z , R, Q) 

f(x JV R ' Q) “ f(S n J,o) 

f ( z |z . ,x ,R,Q) f (Z ,,x ,R,Q) 

_ n 1 n-1 n' n-1 n 

f (z | Z 77R , Q ) f (Z , , R , Q ) 
n 1 n-1 n-1 


f (x nl Z n -1 ,R,Q) 


f (z nl Z n-l' X n' R ' Q) 

f (z n l' Z n-l' R 'Q ) 


(2.3.13) 


On any one trial, the initial state x q is not a random 
variable but assumes a certain value. However, this value is 
not precisely known. To model this uncertainty in the value 
of the initial state, x q is assumed to be a random variable 
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(over the ensemble of all possible initial conditions) having 

a normal probability density function f (x q ) with mean x q and 

covariance about the mean P i . This distribution is presumed 

o | o 

to be known a priori. The a priori state estimate is taken 

to be the mean of this distribution. Because of the symmetry 

of f (x ) about its mean, x is also the point of maximum 
o o 

probability of the distribution. 


e(x 0 ) - 


S [(x o -x o ) <x 0 -x o ) T ] = P 0 i 0 


~ A — . 

x | = x the a priori state estimate 

o | o o 


The averaging here is performed over the ensemble of all 

possible initial conditions and is conditioned upon knowledge 

of x and Pi. 
o o I o 

A 

Let x i , be the maximum likelihood estimate of x 
n|n-l n 

th 

immediately before the n 1 " measurement and let p n | n _^ be the 

A 

conditional covariance of x about its conditional mean x i . . 

n n n-1 


5 i . = e [ (x -x | , ) (x ~x | . ) I Z -| ,R,Q] 

n n-1 n n n-1 n n n-1 1 n-1' 


The averaging here is over the ensemble of all possible 
measurement and driving noises and initial state conditions, 
all conditioned upon the values of R and Q. It can be shown 
that before the update at time n, the conditional proba- 
bility density function of x n is 
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(2.3.14) 


III 11 


f(x n| Z n-l' R ' Q) = 


e 1/2t(x n X n|n-l )Tp ntn-l (x n X n|n-1 )] 


( 2 7T ) 3//2 |P 


n | n-1 1 


1/2 


From (2.3.9) 


z = H x + v 
n n n n 


Since v n is a normally distributed variable, independent of 

x , and x is also a normal variable, then z is a normally 
n' n n 

distributed variable with conditional mean 


e (z 


n 


Z n-l' X n' R,Q) 


e ( z 


n 


jx ) = 
1 n 


H x 
n 


n 


and conditional covariance 


£ tf z n- H n x n )<z n’ H n x n ,T | Z n-l' x n' E ' Q1 = e(v n v nl R) = R n 


Therefore the conditional probability density function of z n is 


f (z J Z n-l ,X n ,R,Q) 


(2r) Y/2 |R n | 1/2 


(2.3.15) 


and from (2.3.12) and (2.3.13) 


T_-l 


L n (x n' Z n ,R,Q) = constant - 1 / 2 ^ X n~ X n\n-l ) P n | n-1 (x n" x n | n-1 


+ (z -H x ) T R 1 (z -H x )] 
n n n n n n n 


(2.3.16) 


where "constant" includes all terms that are not functions 
of x n' 
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The maximum likelihood estimate of x is that value of 

n 

A 

x which maximizes L , or makes 
n n 


9li 


A 

_! 

n 


8 x 


X + X 


= 0 


(2.3.17) 


n n n n 


It can be seen that 


8L 

8x~ 


A 

j 

n 


n 


A T -1 T —1 

- (x -X I ,) Pi , + (z -H X ) R H 

nnn-1 nn-1 nnn n n 


(2.3.18) 


Then after some manipulation, the solution of (2.3.17) is 


A -1 T -1 -1 -1 ~ T -1 

x | » (P 7 i + HR H ) (P 7 , x | n + HR z ) (2.3.19) 

n n nn-1 nnn nn-1 nn-1 nnn v ' 


Upon using the matrix inversion lemma (see Appendix A) 


xi = x i . + A (z - H Xi ,) 
n n n n-1 n n n n n-1 


(2.3.20) 


where 


T T — 1 

= P f . H (R + H P | ,H ) 
n n|n-l n n n n|n-l n 


(2.3.21) 


A^ is called the optimum gain to the measurement residual 

( z - H x | , ) . 

n n n|n-l 

The conditional probability density function of x n after 
the n^ measurement can be shown to be 


f ( x Z , R, Q) 
v n n' 


_-l/2 [ (x -x | ) T P | (x -x | ) ] 

e _ n n | n n|n n n|n 

I 1/2 


( 2ir) 3//2 IP 


(2.3.22) 


n n 1 
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ilium 


where P i is the conditional covariance of x about x i 
n I n n n | n 

after the n 1 " measurement. It can be shown that 


Pi = (P 1 
n n n 


n-1 


T -1 
Hk x H ) 
n n n 


-1 


= (I - A H )P | , (I - A H ) T + A R A T 

v n n n|n-l' n n n n n 


(2.3.23) 


The necessary quantities for computing x n | n can be 
obtained recursively from the estimate at the previous time. 


x n 
n n-1 


$ (n , n-1 ) 


X T 1 

n-1 n-1 


(2.3.24) 


P i , = 4>(n,n-l)P -i | ,$ T (n,n-l) + T Q r T 

n n-1 ' n-1 n-1 ' n n n 


(2.3.25) 


It should be noted that the above recursive state 
estimation equations are identical to those obtained by Kalman 
(Ref. 16) using the method of orthogonal projections and 
Lee (Ref. 20) using the method of weighted least squares. 

It is also easy to show that the state estimate is that esti- 
mate which minimizes the conditional covariance of the state 
estimation error at each stage of estimation. 

If no a priori information about the state is used, the 
logarithm of the likelihood function is defined by 


L n (x n ,Z n' R,Q) = ln f ( z n l x n ' R ' Q) (2.3.26) 

where f( z n l x *R/Q) is the joint conditional probability 
density function of the measurements Z given x , R, and Q. 

n n 
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By application of Bayes' rule 




f(Z n x n ,R,Q) = 


f (Z 


n-1 


x n ,R,Q) 


f (z I 
n 1 


Z._ , ,x n ,R,Q) 


J n-1‘ 


(2.3.27) 


By repeated application of Bayes' rule, it can be shown that 


f (Z 


n 


x n ,R,Q) 


n 

- rr 

i=l 


f (z. 


J i-1 


,x n ,R,Q) 


(2.3.28) 


It can be anticipated that until a sufficient number of 
measurements have been taken, the state estimate cannot be 
defined and there is no unique solution of the likelihood 
equations 


3 L (x ,Z , R, Q) 

n n' n' ' 


3 x 


n 


X +X | 

n n | n 


3 In f(zJx n ,R,Q) 


3 x 


n 


= 0 


(2.3.29) 


x ->-x | 
n n n 


The problem is conveniently broken into two parts, obtaining 
a minimal data set and then subsequent recursive estimation 
using the equations previously derived. A minimal data set 
is defined as the smallest set of measurements that is 
necessary to completely define the state. That is, for 
n < some n Q , there is no unique solution of the likelihood 
equations for the state x n - 

The derivation of the estimation equations when no 
a priori information is used is considerably more complicated 
than the case previously studied when a priori information 
was used. Only the results of the derivation will be pre- 
sented here. Fraser (Ref. 10) obtained the same equations 
given below using the criterion of minimum covariance. 


45 


Prior to obtaining a minimal data set no unique estimate 
of the state exists so an auxiliary variable must be intro- 
duced. Define 


Y 


n | n 


F 


n n 


x' 


n n 


( 2 . 3 . 30 ) 


y i n = f i . x ' | . 

-*n|n-l n|n-l n|n-l 


( 2 . 3 . 31 ) 


where x' i and x' i . are the state estimates obtained with- 

n | n n | n-l 

out a priori information and F i and F i . will be subse- 

n i n ii i n «!> 

A 

quently defined. It can be shown that a unique y n | n and 

A 

y | . exist at all times, but only if F i and F i , are 

■^n n-l n n n|n-l 

of full rank and possess inverses do unique x^| n and x^|n-l 
exist . 

Recursive equations for y n | n , F n | n , y n | n _ r “d F n | n _ 1 
can be obtained with initial conditions 


y i = 0 
J o | o 


F 


o | o 


0 


Subsequently , 


I n-l " (I - C n r n» 


( 2 . 3 . 32 ) 


A T -1 

y i = y | ,+H R z 

■^n n -^n n-l n n n 


( 2 . 3 . 33 ) 


F I . = S -STD 1 r T s 
n n-l n n n n n n 


( 2 . 3 . 34 ) 
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( 2 . 3 . 35 ) 


T -1 

F | = F i i + HR H 

n n n n-1 n n n 


where S n = $ (n-l,n) F n _ 1 | n _ 1 3>(n-l,n) 


-I T 

d = q ± + r x s r 

n n n n n 


c = s r d 1 

n n n n 


It can be shown that F i , and F i are equal to the 

n | n— 1 n | n ^ 

inverse of the state estimation error covariance matrix 
before and after the n 1 " measurement respectively. For 
n < n Q , F n | n is singular, implying that some or all elements 
of the error covariance matrix are infinite, this in turn 
implying that some or all of the elements of the state cannot 
be estimated on the basis of the measurements taken. However, 

A 

once a minimal data set is obtained, the state estimate x' i 

' n | n 

can be obtained from the equation below. 


S\ -I A 

x ' | = F | y | 

n n n n 1 n n 


(2.3.36) 


Subsequently, the usual state estimation equations (2.3.20) 

and (2.3.24) can be used with the solution of the minimal 

data set (2.3.36) used as the initial state estimate and 

F "1" used as the covariance of the initial state estimation 
n | n 

error. 

The solution of the state estimation problem with no 
a priori information can be thought of as the limiting case 
of the solution with a priori information as F 0 1 0 " > 0 . In 
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other words, the covariance of the a priori estimation error 
distribution becomes arbitrarily large and in the limit 
becomes infinite. This is equivalent to having no a priori 
information about the state. 

The state estimate obtained using a priori information 
can be shown to be completely equivalent to a linear combi- 
nation of the state estimate obtained without use of a priori 
information and the propagated forward initial state estimate. 


A -| A 

x I = P I (P , X I + F I x' | ) 

n n n n n o n o n n n n' 


(2.3.37) 


where x , is the combined state estimate 
n n 


x' i is the state estimate obtained without 
n n 


a priori information 


x, is the propagated forward initial state estimate 


x | = $ (n, 0) x | 

no ' o o 


P i is the covariance of the propagated forward 
n | o 

initial state estimation error 

n 

P n | o = $ (n,0)P o | o $ T (n,0) + ^ <Hn,i) r i Q i r^ T (n,i) 

i=l 

P i is the covariance of the a priori state distribution 
o | o 


P i is the covariance of the combined state estimation 
n | n 

error 
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P n I n ^ P n I o + F n I n^ 


-1 


This result is also equivalent to setting the initial 
conditions on y Q | Q and F q j q to p'| 0 x q | Q and P“j 0 respectively. 

It can be shown that in most situations (when the state 
is completely observable by the measurements and controllable 
by the driving noise) that as n + °°i 


in which case 


P i P } 
n n no 


0 


A 


n n 



Thus as would be expected, for large n, the effect of any 
initial state estimate will become arbitrarily small. 

If the true values of R and Q are not known precisely, 
then the measurement information cannot be processed optimally. 
Let R and Q represent the assumed value of the sequences 
R and Q, x n | n represent the state estimate after n measure- 

ic k 

ments using R and Q to compute the measurement residual 

* 

gain matrices, and P n | n represent the "computed" state 
covariance matrix. Then 


n n 


~ * 

xi ,+A (z — H x | -, ) 

n|n-l n n nn|n-l 


(2.3.38) 


n n 


* * * rn * * k rp 

(I - A H )P , . (I - AH) + A R A/ (2.3.39) 

n n n n-1 n n n n n 


* * rn* * rn_l 

A = P | ,H (R + H P | ,H ) 
n nn-lnn nnn-ln 


(2.3.40) 
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vd-rc 



(2.3.41) 


*n|n-l = f'"'"- 1 * x n-l|n-l 
P n|n-1 - ♦tn,n-l)p;.i| n . l * T (n,n-l) + r^r* (2.3.42) 

“k 

represents the conditional state covariance matrix after 
the n th measurement, conditioned upon the assumption that 
R = R and Q = Q . If this assumption is not valid, then 

•ff 

p^| n does not accurately represent the state covariance 
matrix. It can easily be shown that the actual conditional 
covariance matrix can be computed recursively using the 
following equations. 


P , = (I - A*H ) P | . (I - A*H ) T + A* R A* T (2.3.43) 

n n n n n n-1 n n n n n 


P i . = 4> (n,n-l) P . 4> T (n ,n-l) + T Q„ r T 

n n-1 n-1 n-1 n n n 


(2.3.44) 


P n|n re P resents the state covariance matrix under the 

* * 

assumptions that R and Q are used to compute the filter 
gains (2.3.40) while the true values of the noise covariances 
are R and Q. If the initial state covariance is presumed to 

ie ic ie 

be known, then P q j q = p 0 | 0 * Unless R = R and Q = Q , p n | n 

* 

will not be equal to p n | n * Depending upon the values of 

* * 

R, R , Q, Q , this deviation can be very significant. Numeri- 
cal results of a computer simulation of these equations for 
a particular system are given in Chapter 6. 

Because of the linearity of the maximum likelihood 
equations in the state estimation problem, a strong statement 
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can be made about the distribution of the estimation error e 
From the form of the state estimation equations it can be 
seen that if the initial state distribution is normal as well 
as the measurement and driving noises, then the state estimate 
is also a normal random variable. In order to completely 
specify the distribution of the estimation error, the mean 
and covariance of the distribution must be determined. 

Conventionally, an estimator is said to be unbiased if 
over an ensemble of trials the expected value of the state 
estimate is equal to the expected value of the state. Impli- 
cit in this definition is averaging over the probability 
density functions of the measurement and driving noises as 
well as averaging over the ensemble of all initial conditions 
of the state. Even if incorrect values of R and Q are used 
to compute the measurement residual gain matrices, the 
state estimate remains unbiased in the above sense as long 
as the measurement gains are fixed numbers and are not 
random functions of the outcomes of the measurement process. 

The conditional expected value of the state estimate 
(2.3.38) can be computed recursively- 


A ^ A ^ ^ A ^ 

e (x i ) = e (x i , ) + e [A (z - H x i .)] 
n n n n-1 n n n n n-1 


(2.3.45) 


Under the assumption that A n is not a random variable under 
the expectation operator. 
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★ A A fc 

e [A (z -H x | , ) ] = A e(v - Hxi , ) 

L n n n n n-1 n n n n n-1' 


* 

-AH £ (X | , ) 

n n n n- 1 


where 


x i i 

n n-1 


x | , - x 

n n-1 n 


But from (2.3.41) 


e (x | , ) 

n n-1' 


* (n -n-l> e^n-lln-l 1 


(2.3.46) 


Since £ (x r ) = 5>(n,n-l) £ ( x n-1 ) 


and 


s * <"b* 

e ^ x n I n-1^ = E(x n ) + e(x -'~’> 


n n-1 




then £ (x | ) = £ (x ) + (I - A H ) $ (n,n-l) £ (x . i , ) 

n n n n n ' n-1 n-1 


Repeating the above procedure, it can be shown that 


e( ^n|n ) = e(x n } + [ /T (I-A?^) ®(i,i-l)] e(x*| q ) (2.3.47) 


With e( x *| 0 ) = ^ x o|o " x o ) 


and x o|o =e(x o ) 


Of* 

then e (x q i q ) =0 


and 


e(x nln ) = s(x n> 


for all n 


(2.3.48) 
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* 

This result is independent of the values of R, R , Q, Q . 
The maximum likelihood state estimate remains unbaised for 

Jc ★ 

any values of R and Q , but the covariance of such an esti- 
mate is a function of these quantities as expressed by 
(2.3.43) and (2.3.44). Thus it can be seen that over the 
ensemble of trials with all possible initial conditions, 
measurement noises, and driving noises, the state estimation 
error is zero mean normally distributed with covariance 
P | , for any n. 

Now the question is asked: Is the state estimate 

biased over the ensemble of trials with the same initial 
conditions? Or in other words, if the initial state were 
fixed and one averaged the estimate over all measurement 
and driving noises which might be experienced, would the 
state estimate be biased? The answer is yes if a priori 
information about the state is used and the initial state 
is different from the initial estimate. This can be shown 
in a fashion analogous to the previous work. Now all condi- 
tional expected values are additionally conditioned upon the 

value of x , the initial state. From (2.3.47), 
o 


n 


e (x | x ) 
n n 1 o 




e(x |x ) + [ 7T (I“A i H i ) $ (i,i-l) ] e ( x o | o |x q ) 
i=l 1 


( 2 , 


Now e(x* |o |x o ) 


= e [ (x 


o o 


- x o ) 


X o ] 


= X 


o I o 


- X 


3.49) 
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then 


as averaging is not performed over x . Unless x = x* « , 

o o o | o 

e(x n|nl x o> * s(x nl*o> 

The bias of the estimator is due to the use of a priori infor- 
mation in the estimator. If no a priori information is used, 
it is easy to show that 

e<x n|J x o> * e(x n ix o> = 4>(n ' 0) x o 

However, even if initial information is used, as n becomes 
large the bias due to initial condition error becomes arbi- 
trarily small. On the average, x = x = x* i and the 

o o o | o 

estimator is unbiased as shown before. But over the ensemble 
of all possible trials with the same initial conditions, the 
estimate is only asymptotically unbiased. However, the dis- 
tribution of the estimate about this possibly biased value can 
be shown to be normal for any n. 

A slightly different definition of unbiasedness is used 
in Chapter 3 in the discussion of maximum likelihood estima- 

A 

tors of more general parameters. There, an estimator a of 

n 

the true value of the parameters a is said to be unbiased if 



where a Q is the true value of a. This definition is really 
appropriate in situations when no a priori information about 
the parameters is used so that the parameter estimate is a 
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function of the measurements alone. However, the asymptotic 
behavior of the estimator will be shown to be independent of 
the a priori estimate so that this definition is useful even 
if a priori information is used in obtaining the estimate. 

Using this definition of unbiasedness, the maximum likeli- 
hood state estimate is unbiased if 


£ (x | x ) = x 
n n n n 


Using a procedure similar to that used to obtain (2.3.47) and 
(2.3.49), it can be shown that 


n 


e (x | x ) 
n n 1 n 


e(x n |x n ) + [ n' i (I-A*H.)4>(i,i-l)]e(x*| o |x n ) (2.3.50) 


But 


V* i 

£ (X | X ) 

o o 1 n 


e[tx o|o - x o> l x J 


= x | - $ (0 ,n) x 

o o ' n 


and a (x lx) = x 

n 1 n n 

Unless x i = $(0 7 n)x , the maximum likelihood estimator is 
o | o n 

biased. But as before, if one looks at the asymptotic 
behavior of the estimator or studies an estimator which does 
not use a priori information about the state, then 


£ (x | x ) = x 
n n 1 n n 


and the estimator is unbiased. 
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Now the question is asked: What is the effect of 

possible biases in the measurement and driving noises and 
what can be done to estimate these biases? In such a 
situation, the system state is given by the relationship 


x = $(n,n-l)x , + T (w + B ) (2.3.51) 

n ' n-1 n n w' ' • 7 


where as before w is a zero mean random variable with 

n 

covariance Q , with w independent of w, for k ^ n. B is a 
xi n jc w 

constant bias independent of w n with 


e(B ) = 0 
w 


e (B b T ) 
w w 



These conditional expected values are taken over the ensemble 
of all possible driving noise bias values. It is usually 

assumed that over the above mentioned ensemble, B^ is normally 

. 2 
distrxbuted with zero mean and covariance cr_, . 

B 

W 

The measurement z n is given by 


z 

n 


H x + v + B 
n n n v 


(2.3.52) 


where as before v is a zero mean random variable with 

n 

covariance R , with v independent of v. for k 4 n. B is a 
n n k v 

constant measurement bias independent of v n and the driving 
noise' bias B w , with 
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£(B v ) = 0 


e (B 


v 




These conditional expected values are taken over the ensemble 

of all possible measurement noise bias values. Again it is 

usually assumed that B v is normally distributed. 

If the state x is estimated with the effects of these 
n 

biases neglected, then the state estimate x n | n is computed 

using (2.3.38) and (2.3.41), with the "computed" covariance 

matrix given by (2.3.39) and (2.3.42). It is assumed that 

the values of R and Q used to compute these matrices and the 

measurement residual gains are the correct values. Now 

however, the state estimate will not be an optimal estimate 
* 

and p n | n will not correspond to the actual state estimation 
error covariance because of the neglected biases. 

From (2.3.51) and (2.3.41) it can be seen that 


x | n = $(n,n-l) x 
n n-l 


, i . - r (w + b ) 
n-l n-l n n w 


(2.3.53) 


Then the actual state estimation error covariance matrix 
before the measurement at time n is given by 


r\jie iji 

P n|n-1 “ e(x n|n-l x n|n-l> 


= »(n,n-l)P n _ 1 , n _ 1 t T (n,n-l) + T n (Q n + o p > r® (2.3.54) 
1 w 

- »<n,n-l)e<**_ 1 | n _ 1 B*) - <= !„_!> * T <n,n-l> 
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So in order to compute p n j n _]/ the correlation between the 

* a a a Oj ^ 

driving noise bias B and the state estimation error x . ■ 

a w n-l|n-l 

must be determined. This will be done subsequently. 

From (2.3.52) and (2.3.38) it can be seen that 


r\jic r\jic * r\jie 

x | = X I , + A (v + B - H x | -. ) 

n n nn-1 n n v n nn-1 


(2.3.55) 


Then the actual state estimation error covariance matrix 
after the measurement at time n is given by 


/v* %*t . 

P i — e (x | x | ) 

n n n n n n 


P_ i _ + A* (R + ol + H P 


nn-1 ' n n 


H 


V 


-L i i-l 

n n n-1 n 


(2.3.56) 


<\,* rn T *T T * r P 

H e (x I ,B ) - e (B x 7 ,)H ) A - P | ,H A 

n nn-1 v v nn-1 n n nn-1 n n 


* r\,*rn n * m * m * 

+ A e (B xT .) + e (x . .B ) A - A H P , , 

n v nn-1 nn-1 v n n n nn-1 


The correlation between B and x i . must be determined in 

v nn-1 


order to evaluate P 


n n 


Multiplying (2.3.53) by B v and performing the condi- 
tional expected value. 




(2.3.57) 


since it is assumed that B is independent of w and B . 

v c n w 

T a,* T 

k (x | B ) and e (x i B ) can be computed recursively, 
n | n v n | n w J 

Multiplying (2.3.55) by B v and performing the expected value, 
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E(X*. B T ) 
n n v 


* 

(I - AH ) £ (x 

n n 


T * 2 

, -B 1 ) + A at, 

n n-1 v n B 


(2.3.58) 


= (I - A H ) $(n,n-l) e (x 

n n 


T * 9 

. I .B ) + A at 
n-1 n-1 v n B 


v 


Multiplying (2.3.55) by B and performing the expected value. 


r\.ie rn * rp 

e (x I B ) = (I - A H ) e (x I .B T X T 
n n w' n n n n-1 w 


(2.3.59) 


since B is assumed to be independent of v and B . But from 
w n v 

(2.3.53) it can be seen that 


e (x i ,B X ) = $(n,n-l) e (x , i .B ) 
n n-1 w ' n-1 n-1 w 




w 


(2.3.60) 


So (2.3.59) becomes 


v* T 
e (x | B 1 ) 
n n w 


* rn 

" (I - W t(n '"- 1) e(x n-lln-lV 


- (I - 


* O 

A H ) r at 
n n n B 


w 


(2.3.61) 


It is assumed that the initial state estimation error is 
independent of B w and B v so the initial conditions on (2.3.57) 
and (2.3.61) are 


O/* T 

£ (x I B 1 ) 
o o w 


T 

£ (x I B 1 ) =0 

' O O V 


Using an analysis similar to that previously given, it 
can be shown that across the ensemble of all possible initial 
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state conditions, measurement and driving noises, and 

measurement and driving noise biases, the state estimate x* . 

n | n 

is unbiased. However, if the biases are present, the actual 

state estimation error covariance matrix is no longer accu- 

* 

rately represented by P n | n hut rather by P n | n as given above. 

If there is a possibility that biases may be present in 
the measurement or driving noises, then it is usually prefer- 
able to estimate their values so that their effect upon the 
state estimator is diminished. This can easily be accomplished 
within the framework of maximum likelihood state estimation 
already established. 

Define a new state variable 


S T = (X T , B T , B T ) 
n n' w' v 


( 2 . 3 . 62 ) 


and a new state transition matrix 


Y (n, n-1) = 


$ (n,n-l) 
0 
0 


n 


0 

0 

I 


( 2 . 3 . 63 ) 


and a new forcing function matrix 


X = 
n 


n 


0 

0 


( 2 . 3 . 64 ) 
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Then the augmented state s n obeys the recursive relationship 

s n = T(n,n-1) s n-1 + X n w r (2.3.65) 

Define a new observation matrix 


G 


n 


(H n , 0, I) 


Then the measurement z n is given by 


(2.3.66) 


z 

n 


G 


n 


s 


n 


+ v 

n 


(2.3.67) 


Now the problem is reduced to exactly the same form as 
the case when the noises were zero mean except that now the 
state vector is of increased dimension and includes all possi- 
ble noise biases. The estimator for the augmented state s n 
can be formulated in exactly the same way as before with 
initial conditions 


A 

s 


T 

O I o 



0 ) 


This says that the a priori estimates of the biases should 
always be zero since, if they were nonzero, they could be 
removed with the residual uncertainty in the bias values 
then zero mean. 

The covariance of the initial augmented state estima- 
tion error is given by 
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Ei = e (s | s | ) = 
o o o o o o 


o I o 
0 


0 

i 


w 


0 

2 

r B 


v 


where P Q | Q is the covariance of the unaugmented state estimate, 

2 2 

is the covariance of the driving noise bias, and cr_ is 
B B 

W V 

the covariance of the measurement noise bias. 

Thus the augmented state can be estimated using the 
same form of the equations as for the unaugmented state with 
the substitutions 


H 

r 

p 


n 

n 


n n 


-> 


-b 


G 

X 


n 

n 


-*• E 


n n 


n n 


n n 


If the true covariances of the random parts of the noises 
as well as the covariances of the bias parts of the noises 
are known precisely and used in the filter, then it can be 
shown that E n j n accurately represents the covariance of the 
augmented state estimation error, and the filter is' optimal 
in a minimum covariance or maximum likelihood - sense. 

If instead of the measurement and driving noises having 
a bias, they have a component which is correlated with past 
noises, then a slightly different approach must be used. 

Only a limited type of correlation is easily treated so the 
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following definitions are made. 

It is assumed that the state obeys the relationship 


x = <Mn,n-l) x , + T (w + w c ) 
n ' n-1 n n n 


(2.3.68) 


where w is uncorrelated zero mean noise such that 
n 


e (w wT) = Q 5. 
n j n jn 


(2.3.69) 


Q 

a n d w is correlated zero mean noise such that 
n 


C CT ■ I n “3 I /T 

e (w c w . ) = Q e W 

n j c 


(2.3.70) 


is the "correlation time" of the driving noise. It is 

also assumed that w and w C are mutually uncorrelated so that 

n n 


e (w n w j ) = 0 


(2.3.71) 


The correlated noise w^ can be generated by considering 
w n to be composed of two parts . 


c * _1 /t 

w c = w + (e w ) w c , 
n n n-1 


(2.3.72) 


where w n is a zero mean random noise that is independent of 
all past noises with 


* *T 

£ (w n w n ) = Q c (1 - e ) 


(2.3.73) 
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It is easy to show that the correlated noise defined by 
(2.3.72) has the proper correlation between the noises at 
different times as given by (2.3.70). 

It is also assumed that the measurement z is given by 


z = H x + v + v 
n n n n n 


(2.3.74) 


where v is uncorrelated zero mean noise such that 
n 


e (v v . ) = R 6 . 
n j n jn 


(2.3.75) 


and v c is correlated zero mean noise such that 
n 


c cT " l n "jl /T v 

£ (v° v? T ) = R e 
n j c 


(2.3.76) 


t is the "correlation time" of the measurement noise. It is 
v 

again assumed that v n and v^ are mutually uncorrelated with 
the further assumption that all measurement noises are 
uncorrelated with all driving noises. 

Again it is convenient to define the correlated measure- 
ment noise by 


c * , _1/ V c 

v = v + ( e ) v , 

n n n-1 


(2.3.77) 


where v n is a zero mean random noise that is independent of 
all past noises with 


* 

£ (V 

n 


*T . 
V ) 
n 


R c (1 


-2/x 

e 


v, 


(2.3.78) 
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It is easy to show that the correlated measurement noise 
defined by (2.3.77) has the proper correlation between the 
noises at different times as given by (2.3.75). 

It should be noted that when the correlation time of 
the noises becomes very large, the correlated noises approach 
constant biases, whereas as the correlation times become 
small, the noises become uncorrelated. 

If it is assumed that the state x n is estimated neglect- 
ing this correlation, the state estimate x n | n is computed 

using (2.3.38) and (2.3.41), with the "computed" covariance 

A * 

matrix given by (2.3.39) and (2.3.42). Again x n | n will not 

* 

be an optimal estimate and P n | n will not correspond to the 
actual state estimation error covariance matrix because of 
the neglected correlation in the noises. 

From (2.3.68) and (2.3.41) it can be seen that 


*n|n-l ' Cl I n-1 ' r n (w n + w n> 


(2.3.79) 


Then the actual state estimation error covariance matrix 
before the measurement at time n is given by 


P 


n 


n-1 


$ (n,n-l)P n _ 1 | n _ 1 ^ 


c ^*T . ,T 

r n £(w n x n-l|n-l )4 


(n,n-l) 

(n,n-l) 


+ r„( Q „ + Q c )rjj ( 2 . 3 . 80 ) 

r\,ie nT 1 T 1 

-*("'"- 1|El! n -l| n -l' n )r n 


In order to compute p n | n _i» the correlation between the driving 

C 

noise w and the state estimation error x . i . must be 
n n-1 | n-1 
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computed. This will be done subsequently. 

From (2.3.74) and (2.3.38) it can be seen that 


n,* a,* * c* 

xi = x | .+A (v +v -H x | .) (2.3.81) 

n n n n-1 n n n n n n-1' 


Then the actual state estimation error covariance matrix after 
the measurement at time n is given by 


* * rp * *m 

P I = ( I-A H )P I - (I -A H ) + A (R + R ) A 

n n n n n n-1 n n n' n c n 


(2.3.82) 


* * rn * c rn *m 

+ V (v S x nln-l )(I - A n H n» + (I - A nV s<x n|n-l v n > A „ 


C 

The correlation between v and x i . must be computed in 

n n | n-l 

order to evaluate P i . 

n | n 

Multiplying (2.3.79) by v° and performing the conditional 
expected value 


e( "n|n-l V n T) = £ (x n-l I n-lO (2 ‘ 3 ‘ 83 > 


c c 

since it is assumed that v n is independent of w n « But using 

: k 

(2.3.77) plus the independence of v , 


A* cT, , 

e (x . | , v ) = (e 

n-1 n-l n 


-1/t 


V \ /^* CT . 

) £ (x n-l|n-l v n-l ) 


(2.3.84) 


Similarly it can be seen that 


e (x , | n w cT ) = (e w ) e (x . i ,w cT ,) (2.3.85) 

' n-1 n-1 n v ' s n-1 n-1 n-1 
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f\jie CT ''b* CT 

e (x n | n v and e (x .1 n w n ) can be computed recur- 

n-l|n-l n-1 n-l|n-l n-1 ^ 

c 

sively. Multiplying (2.3.81) by and performing the 
expected value. 


A* cT, 
£ (x | V ) 
n n n 


* r \j* r*T * 

(I - AH) e(x | n v^ x ) + A R 
n n n n-1 n n c 


(2.3.86) 


(I-A n H n ) $ (n , n-1) (e 


-1/x 


v v A* cT . 

' ' n-1 n-1 n-1 


* 

+ A R 
n c 


Multiplying (2.3.81) by vr and performing the expected value. 


A* cT. 
e (x i w ) 
n n n 


* Oj * c* T 1 

(I - A H ) e (x | . w ) 

n n n n-1 n 


(2.3.87) 


★ r\, ic pT 1 

(I-AH ) $(n,n-l) e (x , . 1 w^ 1 ) - r Q r 

n n n-l n-l n n c 


— 1/t 

a-A*H n ) «(n,n-l Me w ) e (x*. 1 , n _ 1 w^l-r^ 


It is assumed that the initial state estimation error 
is uncorrelated with the measurement and driving noises, so 
the initial conditions on the recursive equations (2.3.86) 
and (2.3.87) are 


v* cT. A* cT. 
e (x | v ) = e (x | w ) 
oo o' oo o 


0 


By analogy with the estimation of possible noise biases, 
it is possible to estimate the correlated part of the measure- 
ment and driving noises. 
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Define a new state variable 


T , T cT cTi 
s n ‘ (x n' w n ' v n > 


and a new state transition matrix 


V (n,n-l) = 


-1/t 


$(n,n-l) (e 

0 

0 0 


w 


) r 


n 


-1/t 

/ w. T 

(e ) I 


0 

, “ 1 /T v, 
(e ! 


and a new forcing function matrix 


X n " 


r r o 

n n 


and a new "driving noise" vector 


T *T T *T. 

u n = w n ' w n' v n 1 


It can be seen that the new state s n satisfies the 


s n = f(n,n-l) + X n u n 


and the measurement z is given by 


z = G s + v 
n n n n 


( 2 . 3 . 88 ) 

( 2 . 3 . 89 ) 
I 

( 2 . 3 . 90 ) 

( 2 . 3 . 91 ) 
relationship 

( 2 . 3 . 92 ) 

( 2 . 3 . 93 ) 
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(2.3.94) 


where G is defined by 
n 2 




G = (H , 0, I) 
n n' ' 


Now the problem is reduced to exactly the same form as 
the cases when the noises are uncorrelated except that now 
the state vector is of increased dimension and includes all 
possible correlated noises. The estimator for the augmented 
state s n can be formulated in exactly the same way as before 
with initial conditions 


s 


T 

o 


o 


, T 

(x o|o' 


0 , 


0 ) 


The 
error is 


covariance of the initial augmented 
given by 


state estimation 


E 

o 


o 



^T , 
s f ) 
o o o 


o q c o 

0 0 R 

C 


Thus the augmented state can be estimated using the 
same form of the equations as for the unaugmented state 
without correlated noises. 

If the true covariance of the correlated and uncorre- 
lated parts of the noises as well as the proper correlation 
times are known precisely and used in the filter, then it 
can be shown that E i n as computed by the filter accurately 
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represents the covariance of the augmented state estimation 
error, and the filter is optimal in a minimum covariance or 
maximum likelihood sense. 


70 



Chapter 3 




MAXIMUM LIKELIHOOD ESTIMATION 
OF NOISE COVARIANCE PARAMETERS 
AND THE SYSTEM STATE 


3 . 1 Introduction 

In Chapter 2 the theory of maximum likelihood estimation 
was briefly discussed and then applied to the problem of 
state estimation. The resulting equations were derived under 
the assumption that the probability density functions of the 
measurement and driving noises as well as the initial state 
probability density function are known a priori. It was 
shown that if the second order statistics of the noises are 
not known precisely, the state estimation becomes suboptimal. 

The purpose of this chapter is to utilize the concepts of 
maximum likelihood to remove the restriction that R and Q be 
known precisely a priori in order to obtain an optimal state 
estimate . 

In Section 3.2 important definitions are given and a 
summary of some classical results of maximum likelihood esti- 
mation discussed. These results concern the asymptotic 
properties of maximum likelihood estimators, but they cannot 
be directly applied to the problem of state and noise covariance 
estimation . 

In Section 3.3 the likelihood functions appropriate for 
the solution of a set of closely related problems are derived. 
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all of which concern the estimation of the noise covariance 
parameters. Section 3.4 is devoted to demonstrating the 
asymptotic properties of these estimators. 

The remainder of this chapter concerns the application 
of the theoretical results to the problem of state and noise 
covariance estimation. 

3 . 2 Summary of Previous Results in Maximum Likelihood 

Estimation 

Maximum likelihood estimation has been studied by many 
authors and many useful results have been obtained concerning 
the properties of maximum likelihood estimators. These 
results apply directly only to a limited set of problems, 
when the measurements are independent and identically 
distributed. However, they provide a base upon which the 
analysis of more general problems can rest. The purpose 
of this section is to summarize the important results and 
definitions which will be needed to extend the analysis to 
more general problems. 

First several important definitions must be made. 

These definitions apply equally well to any situation when 
the values of certain parameters are to be estimated on the 
basis of observations of a random variable which is a function 
of these parameters. They are not limited to situations 
when the criterion of maximum likelihood is used to define 
the estimate. 

The estimator of the true value of the parameter a is an 

A 

observable random variable, say a n ( z i'°*' z n ^ which is a 
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function of the sample elements . . ,z n > and whose distri- 

bution is, in some sense, concentrated about the true value 
of a. As in linear estimation, it will be found that the 
covariance of the estimate is often a reasonable criterion 
for measuring the concentration. If the realized (observed) 


value of a n corresponding to a realized (observed) value of 

(z^,..,z n ) is used for a Q , the true value of a, then the 
— A 

random variable a is called a point estimate or estimator 

n 

A 

for a . This use of a normally would be made, of course, 
o n 

only when the value of a is unknown. 

J o 

A A 

If when a = a , e (a |a)=a , then a is called an 
o n ' o o n 

unbiased estimator for a . This is the last definition of 

o 

unbiasedness that was used in Chapter 2 in the discussion 
of maximum likelihood state estimation. 

/s 

If an estimator a n converges to a Q as n 00 , it is 
called a consistent estimator for a Q . A necessary condition 

/s 

for a to be a consistent estimator is that it be unbiased 
n 

and have a covariance which goes to zero as n 00 . 

A 

If a is an unbiased estimator for a having finite 
n o 

covariance and has the further property that no other unbiased 

estimator has a smaller covariance than a , it is called an 

n 

efficient estimator. 

The following results of maximum likelihood estimation 

have been obtained by Rao (Ref. 25), Wilks (Ref. 37), and 

Deutsch (Ref. 6) after certain assumptions have been made 

about the nature of the likelihood function. 

T T T 

Let Z n = (z^,..,z ) be a vector of n independent 

identically distributed observations and a be the m x 1 
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vectors of parameters being estimated. Then the joint condi- 
tional probability density function of Z n can be found by 
application of Bayes' rule. 

f(Z n |ct) = f(Z n _ 1 |a) f ( z n l z n _i' a > (3.2.1) 

where f(z |z , ,a) is the conditional probability density 
n ' n— 1 

function of z given Z . and a. Because of the assumed 
n n-i 

independence of the z^, 

f( z n l Z n-l' a) = f ( Z rJ°^ (3.2.2) 

By repeated application of Bayes' rule, it can be seen that 

n 

f (Z la) = Tf f (z . | a) (3.2.3) 

n i=l 1 

It is assumed that the likelihood function is chosen to be 
the probability density function (3.2.3), in which case the 
natural logarithm of the likelihood function has the form 


L (Z ,a) = In f (Z la) (3.2.4) 

n n n 

n 

= ^ In f (z^. | a) 

i=l 

Then 

9 L ( Z , ot ) ^ — i 3 In f ( z . | ot ) 

n s„ n - I — (3 - 2 - 5) 

i=l 
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As stated in Chapter 2, maximum likelihood estimation is 
concerned with finding the value of the parameters a such that 


3L ( Z , a ) 
n n 

— 3a 


= 0 


a 


n 


For notational convenience, define 


f = f (Z a) 
n n 


f = f (z | a) 

The following assumptions are made about the likelihood 
function. 

3L 3 2 L 3 3 L 

1) The derivatives -r — — , — ^ ^ exist for almost 

0 CL r\ £ rs J 

9a 9a 

all Z in an interval ft of a. 
n 


2) e 


r i 

3f I 

n i 

= o, 


l 

3 2 f 1 

n | 

L f n 

0 

<3 

<ro 

£ 

f 

n 

3a3a |a o 


= 0 


3) £ 


r , 

3 f | 

T 

3 f I 

_L 

n 


SL| 

Lf 2 

n 

i i 


3a ' a o j 


is positive definite 


4) For every a in ft 


1 

n 


3 3 ln f 


n 


3a^ 3a^ 3a^ 


< M ( Z ) 
n 


with e[M(Z ) |a Q ] < K for some K which is independent 


of a and n. 
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Define 


S (z ,a) = 

S (Z ,a) 
n n' 

J (a Q ,a) 


J 

n 


(a o ,a) 


9 In f 
3a 


the m x 1 single measurement score 


3 In f 


n 


3a 


the m x 1 total measurement score 


e [S (z,a) S (z,a) | a Q ] 


/ 


S T (z , a) S (z j,a) f(zja Q ) dz 


£[S n (Z n' a) S n ^ Z n ,a) ! a o^ 


-H 


.. S (Z , a) S (Z , a) f (Z a ) dZ 
1 n n n n n 1 o n 


J (a ) = J (a ,a ) the m x m single measurement 
o o o 

conditional information matrix 

J (a ) = J (a »a ) the m x m total measurement 
n o' n o' o 

conditional information matrix 


The following theorems are from Wilks. The proofs will 
not be repeated here but will be discussed subsequently. 

Asymptotic Distribution of the Scor e 

Suppose (z.,..,z ) is a sample from the probability 
density function f(z|a ). Let f(z|a) possess finite first 
derivatives with respect to a in the range A. Then if 
J n (a,a) is positive definite for a in Q, the total measure- 
ment score s n ( z n ' a 0 ) i- s asymptotically distributed for large n 
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as a zero mean normal random variable with covariance J n (a o ). 

Converge nce of the Maximum Likelihood Estimator 

Suppose (z^,..,z ) is a sample from the probability 
density function f(z|a Q ) where f(z|a) possesses finite first 
derivatives with respect to a in ft . Let S^(z,a), the 
component of the vector S(z,a), be a continuous function of a 
in ft for all values of z except possibly for a set of zero 
probability. Then there exists a sequence of solutions of 

(Z , a) =* 0 (3.2.6) 

n n ' 

which converges almost certainly to a . If the solution is 

A 

a unique vector for n >_ some n Q r the sequence of vectors 
converges almost certainly to a Q as n **► 00 . 

Asymptot ic Distribution of the Maximum Likelihood 

Estimator 

If (z^,.. f z n ) is a sample from the probability density 
function f(z|a ) where f(z|a) possesses finite first and 
second derivatives with respect to a in the range ft, and if 
the maximum likelihood estimator satisfying (3.2.6) is unique 
for some n >_ some n , then it is asymptotically normally 
distributed for large n with mean a and covariance ( J (a ) ] 1 . 

Thus under the assumptions previously given, the maximum 
likelihood estimator of the parameters a Q is asymptotically 
unbiased and normally distributed for any value of a Q in the 
range ft , with 
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A 



a o } 


= a 


o 


£[(a n -a o ) (a n -a o ) |a Q ] 


[J n (a o )] 


-1 


Now the distribution of the estimation error over the ensem- 
ble of all possible true values of a Q is sought. An analytic 
expression for the unconditional probability density function 

of a cannot be found in most situations. Formally 
n 

f (a ) = / f (a , a ) dot 

n J n o o 

Q, 

= J ffa nl a o } f(a o ) da o 

ft 

/s 

Even if f( a n l a 0 ) ^- s a normal density function, the above 
integral is usually nonanalytic for any nontrivial f (a Q ) . 

A 

However, even if the unconditional distribution of a is not 

n 

known, two useful moments of the distribution, the mean and 
covariance, can be evaluated. 

The unconditional mean of the estimate is defined by 

/X f A 

E(a n ) = J £(a n |a o ) f (a Q ) da Q 

n 

= / a f (a ) da = oT~ 

Jo o o o 

n 

where a - is the mean of the distribution f (a ) . 
o o 

The unconditional covariance of the estimate is defined 
by 
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cov(a n ) = E[ (a n -E(a n ) ) (a n -E(a n ) ) T ] 


= E[(a n -a o ) (a n -a Q ) ] 


But 


E[( “n-“o> ( “n-“o )T l - E l<“n-“o )( “n-“o> Tl + E 1 ( %‘ 


+ E[( V a o > (a o" a o )Tl + E ««a 0 -a 0 > (a n - 


and E[ (a n -a Q ) (a o -a o ) T ] = E [ e (a n -a Q ) (a Q -a o ) T ] 

= 0 

So cov(a n ) = E[ (a n -a Q ) (a n -a o ) T ] + E [ (a o -a^) (a o -oT) T ] 
But E[ (a n -a Q ) (a n -a Q ) T ] = E [e ( (a n -a Q ) (a n -a Q ) T ) ] 

= E[ (J n (a 0 ) ) _1 ] 

A 

n 

and E[(a Q -a o ) (a o ~a o ) T ] = cov(a Q ) the covariance of the 

a distribution 
o 

cov(a ) = J + cov(a ) 
n n o 


Then 



j” 1 represents the mean square estimation error matrix, which 
for any nontrivial f (a Q ) is nonanalytic„ Formally 

j? - / tW 1 ' 1 *(«„> 

n 

There are several approximate techniques for evaluating this 
integral which are discussed in Section 3.7, 

3 . 3 Derivation of the Likelihood Function 

In this section several closely related problems are 
studied and the likelihood function appropriate for the 
solution of each derived. It will be shown that the asymp- 
totic behavior of the solutions of each problem is the same 
so that if the asymptotic behavior of any one is found, the 
results can be applied to the others. The notation and 
definitions of Section 2.3 are used with the additional assump- 
tion that the measurement and driving noise covariance matrices 
are diagonal and time invariant. The technique of maximum 
likelihood estimation is not restricted to cases when this 
assumption is valid, but the estimation problem becomes much 
more complicated if this assumption is not made. A discussion 
of the problem when this restriction is not employed is given 
in Chapter 7. 

Estimation of Noise Covariance Parameters with No 
A Priori Noise Covariance Information 

The first problem considered is estimating the diagonal 
elements of the measurement and driving noise covariance 
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matrices without the use of a priori information about these 
quantities. The maximum' likelihood estimate of the noise 
covariance parameters is defined by 

1(R_ /Q_#Z ) = max 1(R,Q,Z ) (3.3.1) 

n n n R#Q n 

where l(R,Q,Z n ) is the likelihood function which is chosen 
to be the conditional probability density function 


1(R / Q / Z n ) = f(zjR,Q) 


(3.3.2) 


By application of Bayes 1 rule 

f(ZjR,Q) = f(Z n _ i |R / Q) f (2 n |Z n _ 1 ,R,Q) 

Repeating the above procedure to find f(Z n _ 1 |R,Q), it can be 
shown that 

n 

f(Z | R,Q) = 7T f(z.|z. ,R,Q) (3.3.3) 

n ^ ^ i l j. 

where f (z^ | R,Q) is the conditional probability density 
function of z^ given Z^_^, R, and Q. 

Using the results of Section 2.3, it can be shown that 
z^ is a normally distributed random variable with conditional 
mean 


e (z . Z . . ,R,Q) 
i i-I 


H i 
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and conditional covariance 


e (Az . AzT Z. .. ,R„Q) = R + H.P. , . H? 
i 1 1-1' ' l li-l l 


where 


Az. =z. -H. x . i . , 
i l ii i“l 


✓N 

x. i . , is the maximum likelihood estimate of x. after i-1 

i|i-l l 

measurements using the true values of R and Q to compute the 


proper filter gains, and P j_ | is the conditional covariance 

of x . about x . | . n . 
l i | i-I 


P . 

l 


i-1 


/\ /\ ip 

e [ (x . -x . i . , ) (x . -x . | . ) I Z . , 

L i li-l l li-l 1 i-I 


R/Q] 


It is assumed that a priori information about the state is 

used in forming the above state estimates so that a unique 

x. i . i exists for all i. 

1 1 i-I 

Define 


B. 

i 


R + H.P. I . ,h: 
1 1 1-1 1 


Then the conditional probability density function of z^ 
is given by 


f | Z i _ 1 ,R,Q) 




-1/2 (AzTb. 1 Az . ) 

'ii l 


(3.3.4) 


As in Chapter 2, it is convenient to work with the 
natural logarithm of the likelihood function (3.3.2). 


L n (R,Q,Z n ) = In l(R,Q,Z n ) 
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After algebraic manipulation, 


L (R,Q,Z ) = constant - 1/2 V* ln|B.| + Az^B^Az. (3.3.5) 
ix n I / f x 1 x x* I 

i=l 

where "constant" includes all terms that are not functions of 
R or Q. 

It is convenient to introduce an auxiliary variable. 


- m 11 . 


.R^.Q 11 , 


,Q m ) 


E is the (y + n) x 1 vector of the diagonal elements of 
R and Q. 

The likelihood equations are obtained by equating the 
derivatives of L n (R,Q,Z n ) with respect to E to zero. Using 
the identities of Appendix A, after algebraic manipulation, 


3L 
n 


n 

1/2 TrUBT 1 

i=l 



Az . A zTb 

X X 


-1 

i 


3B . 




.-^h t 
1 1 

(3.3.6) 


E n is found as the solution of 


9L 1 

^ = 0 (3.3.7) 

, r 


In general there is no closed form solution of (3.3.7) for 

/\ 

E so an iterative solution like those described in Section 
n 

3.6 must be employed. 
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Estimation of Noise Covariance Parameters with A Priori 


Noise Covariance Information 

In this problem the measurement and driving noise 
covariance matrices are not known precisely a priori but 
rather knowledge of them is described by a joint probability 
density function f(R,Q), where it is assumed that f(R,Q) is 
known a priori. The maximum likelihood estimate of the noise 
covariance parameters in this case is defined by 

1 (R ,Q ,Z ) = max 1 A (R,Q,Z ) (3.3.8) 

n' n' n D _ n 

A 

where 1 (R,Q,Z n ) is the augmented likelihood function which 
is chosen to be the conditional probability density function 

l A (R,Q,Z n ) = f(R,Q|Z n ) (3.3.9) 


By application of Bayes' rule 


f (R,Q|Z n ) 


f(Z n |R,Q) f(R,Q) 

f (Z ) 
n 


(3.3.10) 


f(Z ) need not be evaluated as it is not a function of R or 
n 

Q. Formally 


f (Z n ) 



R,Q) f(R,Q) dR dQ 


All R and Q dependence is integrated out. 
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1 


Define 


L^(R,Q,Z n ) = In l A (R,Q,Z n ) 


(3.3.11) 


Then it can be seen that 


(R,Q,Z n ) = L n (R,Q,Z n ) + In f (R,Q) - In f(Z n ) (3.3.12) 


It is assumed that R and Q are independent random 
variables, in which case 


f(R,Q) = f(r) f (Q) 


It is further assumed that the diagonal elements of R and 
Q are mutually independent, so 


Y 


f(R) = TT f (R 11 ) 
i=l 


f (Q) = TT f (Q 11 ) 
i=l 


n 


Then L A (R,Q,Z ) 
n n 


= constant - 


1/2 £ ln!B i |+Az^B i 1 Az i l 

- i=l ■* 


Y 

+ ^ln f (R 11 ) 
i=l 


+ 


n 

In f (Q 11 ) 

i=l 



(3.3.14) 


where "constant" includes all terms that are not functions 
of R and Q. 
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V 


(R/Q/Z ) r t -l 

- , 2- = - 1/2 ) Tr[ (B. -B.Az.Az7B. ) — i (3.3.14 

3£ D L £ z , 1 1 3C D 

^IJi 

- 2BT 1 Az. — — 1-1 hT] 1 + 31n £ f^ 

1 1 3 5 3 1 J 3^ 


3L (R,Q , Z ) 
n ' n 

ac j 


+ 3 in f(g j ) 


(3.3.14) is then set to zero and solved for £ . Again there 
is no general closed form solution so some iterative procedure 
must be employed . However, it can be seen that the inclusion 
of a priori information has a tendency to shift the solution 
point towards the peak of the a priori distribution of £. 


Estimation of Noise Covariance Parameters and the System 
State with No A Priori Noise Covariance Information 
In this problem the noise covariance parameters and the 
state are to be estimated simultaneously. No a priori infor- 
mation about the noise covariance parameters is to be used, 
but as before it is assumed that a priori state information 
is used. The maximum likelihood estimate of these quantities 
is defined by 


1 1(E ,Q ,x j ,Z ) = max l(R,Q,x ,Z ) (3 0 3 „15) 

n n n | n n r,q, x 

n 

where l(R,0,x , Z ) is the likelihood function which is chosen 

' ' n n 

to be the conditional probability density function 


l(R,Q,x n ,Z n ) = f (x n ,zjR,Q) 


(3.3.16) 
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where f(x n ,Z n |R,Q) is the joint conditional probability density 

function of the state x and the measurements Z given R and 

n n 

Q. By application of Bayes' rule 


f (x ,Z R,Q) = f (x Z ,R,Q) f(Z R,Q) 
n n 1 n 1 n n 1 


(3.3.17) 


Define L (R,Q,x ,Z ) = In l(R,Q,x ,Z ) 

n ' n' n ’ ' n' n 


(3.3.18) 


The set of parameters to be estimated is now 


T , T r T . 
a = (x n , E, ) 


Using (2.3.22) and (3.3.5) it can be seen that 


L (R,Q,x ,Z ) = constant - 1/2 In I P , 1+ Ax T P _ | Ax (3.3.19) 

n n' n I n|n' n n|n n 

31 1 
+ > In I B . I + AzTbT 1 Az. 

/ ■ 1 X 1 XX X 

i=l 

where Ax = x - x i 

n n n | n 

and "constant" includes all terms that are not functions of 
x , R, or Q. 

The likelihood equations are obtained by equating the 
derivatives of with respect to a to zero. Dealing first 
with finding the state estimate. 


9L 

n 

9x 


n 


^ T -1 

(x - x I ) X P j 
n n | n n | n 


(3.3.20) 
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Then the solution of 


3L 1 

n 

i 


1 x 

X 1 

9x 

n J 

1 n 

n |n 

’ 

A 


5 

" C n 


(3.3.21) 


is clearly 


n 


x | (Z ,R ,Q ) 
n n' n f n' n 


(3.3.22) 


This says that the maximum likelihood estimate of the state 

x after n measurements is just the maximum likelihood state 
n 

estimate which uses the estimates of R and Q to compute the 
filter gains. 

The simultaneous estimates for £ (R and Q) are found as 
the solutions of 


/ 


\ 


3L 


n 


9? 


x 


n 



0 


? + £ 


n 


(3.3.23) 


Using the identities of Appendix A, after algebraic 
manipulation , 


3L r -i -j m , 3P„ . , 3x . 

— 2 = - ~ Tr[(P } -P | Ax Ax T P } ) — 2lS. - 2 P f Ax — 2-lil] 
2 n[n n|n n n n|n n|n n ^3 


3L 


n 

1 

9 ? j " 

2 


n 

+ 

y- 


i'll 

* *rH 


5T 1 -B . 1 Az 

x x xxx 




H- 


-1 ^ X i I i-1 T 

2 B. Az. -L .- — HT] (3.3.24) 

1 1 3 ^ 1 J 
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Substituting the solution of (3.3.21) into (3.3.24), 


3L 




j ) X n^ X n | n 




= - i[Tr(P } — 2^2-) + y Tr [ (B . 1 -bT 1 A.z . AzTb. 1 )— 4 

2 L n l n ^ tl 11111 3 6 3 


n 


A rp 

2 B T 1 Az j hT] 


1 1 9^ 


~ = 0 


(3.3.25) 


5+S 


n 


As before there is no general closed form solution of (3.3.25) 

A 

for so some iterative procedure must be employed. However, 
when there is no driving noise (Q = 0) a considerable simpli- 
fication occurs. 

By use of Bayes' rule, the likelihood function (3.3.16) 
can be rewritten in the following form. 


f (x^ , Z^ Rf Q) — f (x n , Z^_ ^ | R / Q ) f ( z n I Z ^ / R / Q) (3.3.26) 


By repeated application of Bayes' rule, it can be shown that 


n 


f(x n ,ZjR,Q) = f(x n |R,Q) ft f (z i |Z i _ 1 ,x n ,R,Q) (3.3.27) 


When Q = 0, it is easy to show that 


f (x„ R,Q) = 


n 


1 -| [(x n- x n|o )Tp nto (x n- x n|o^ 

6/2,. . , 1/2 e 


(2tt) ' P 


no 1 


where 


x nlo " * (n '° )x o|o 


P n|o = »<n,0)P o , o 4 i (n,0) 
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and f (z i | Z i _ 1 ,x n ,R,Q) 


1,. T -1. . 

2 in in' 


(2^)^ 2 lR\ 1/2 


where 


Az. | = z. - H. $(i,n)x 

in 1 l ' n 


Then (3.3.19) becomes 


L 

n 


(R,Q,x n ,Q n ) 


constant 



T — 1 ^ 

+ (x -x | ) P | (x -x | ) 

n no' non no 


n 

+ ^Tln|R| + (z i -H i $ (i ,n) x n ) T R _1 (z i ~H i $ (i ,n) x n ) 
i=l 


(3.3.28) 


Then 


3L 

_ n 

3x 


n 


^ T -1 

= - (x -x | ) P | + 

n no no 


n 

(z i -H i $ (i ,n) x n ) T R 1 H i $(i / n) 

i=l 


(3.3.29) 


n 

Define F n|n^ R n^ = $T ^ H i R n lH i $ ^ 

i=l 


Then after algebraic manipulation, the solution of (3.3,21) 


for x 1 is 
n I n 


n 


^ -1 -1 -1 ^ V"* T 

x I = (P T + F . ) (P , x I + ) $ (i,n)H7R z.) (3.3.30) 

n n n o n n n o n o ' i n v 1 


i=l 


Using the identities of Appendix A, it can be shown that 


3L 

9C 


n 




i=l 


-1. . T -1, 9 R . 

R Az . | Az . | R ) r] 

i| n !| n 


(3.3.31) 
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(3.3.32) 


The solution of (3.3.25) for R-^ then becomes 

n 

n 

= 5 E [(z r H i 4 ' (i ' n)x n|ii (z i- H i $(i ' n)x n|n )Tl3] 
i=l 

A closed form solution of (3.3.30) and (3.3.32) for 

A A 

x n | n and R n is not possible except in the trivial case of a 
scalar measurement and when no a priori information about 
the state is used. In this case, p n | 0 = ^ and (3.3.30) 
becomes 


n n 


n 


n 


i=l 


^ $ T (i,n)H^H i $ (i,n) 1 £% T (i,n)HT Zi (3.3.33) 


i=l 


From (3.3.33) it can be seen that x i is not a function of 

n n 

R n so that | n can be computed independently of what value 

A 

of R n is obtained from (3.3.32). 

In any other case a numerical solution of (3.3.30) and 
(3.3.32) must be performed. However, even if a closed form 
solution is not obtained, the estimation equations in this 
no driving noise case have a particularly simple form. 


Estimation of the Noise Covariance Parameters and the 
Sys tern S tat e w ith A Priori Noise Covariance Information 
In this problem the state and noise covariance parameters 
are to be simultaneously estimated when a priori information 
about R and Q is used. The maximum likelihood estimate of 
these quantities in this case is defined by 


(R ,Q ,x | , Z ) = 

n' w n' n n' n 


max 
R > Q / x 


1 (R,Q,X ,Z ) 
' ' n' n 


(3.3.34) 


n 


91 



where 1 A (R,Q,x n , Z n ) is the augmented likelihood function 
which is chosen to be the conditional probability- density 
function 

1 (R/Qf x n , Z n ) = f(R,Q,X n |z^) (3.3.35) 

By use of Bayes ' rule 

f(R,Q,xJZ n ) = f(x n |R,Q,Z n ) f(R,Q|Z n ) (3.3.36) 


From (3.3.10) 


f (R,Q|Z n ) 


f(ZjR,Q) f (R/Q) 

fTv 


Assuming that all the diagonal elements of R and Q are 
mutually independent, it can be shown that 


l£(R,Q,X n ,Z n ) = In 1 (R,Q,x n ,Z n ) = L n (R,Q,x n , Z n ) (3.3.37) 

Y . _ n 

In f (R 11 ) + 
i=l i=l 


r < '* 

+ Y In f (R 11 ) + Y ln f(Qll) 


So 


9L n (R,Q,x n ,Z n ) ^ 8L n ( R,Q,x n ,Z n ) 3i n f(£) 

~ 3a 3 a 9 a 


(3.3.38) 


where 


9L (R,Q,x .Z ) 


n 


n r n 


3a 


is given by (3.3.20) and (3.3.24). 


It can be seen that the likelihood equation for the state is 
unchanged by the inclusion of a priori information about £ 
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since f(£) is not a function of x n « The likelihood equations 
for the noise covariance parameters are modified by the 
addition of the term related to the a priori probability 
density function of the parameters £. 

Several comments should be made about the four problems 
just discussed. In each problem it was assumed that a priori 
information about the state was used in forming the state 
estimates. This assumption greatly simplifies the formula- 
tion and solution of the problem while not being unreasonably 
restrictive. If the initial state estimate is believed to be 
of poor quality, then setting its covariance to a large 
positive definite matrix will effectively result in not using 
the a priori information about the state. The assumption 
that the initial state uncertainty has a normal distribution 
is a realistic assumption in most applications. 

However, it was felt that a distinction should be made 
between noise covariance estimators which do or do not use 
a priori information about these parameters. The derivation 
of the estimation equations with no a priori noise covariance 
information is important because an arbitrary selection of an 
a priori distribution of these quantities does not have to be 
made. The proper choice of a distribution for the covariance 
parameters is much less clear than was the case in choosing a 
distribution of the initial state estimation error. The case 
of no a priori information could be handled within the frame- 
work of the estimator that uses a priori information by setting 
the covariance of the a priori noise covariance parameter 
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distribution to a large quantity but with relatively little 
additional effort the two cases can be treated separately. 



The most physically motivated' problem is the last of the 
four given above, that of maximizing the joint conditional 
probability density function of the state and noise covariance 
parameters. The solution of this problem gives the most 
probable values of the state and noise covariances based upon 
the measurements and the a priori information. However, as 
will be seen, the asymptotic behavior of the solution of this 
problem is most easily obtained in terms of the asymptotic 
behavior of the simpler problem of estimating the noise 
covariance parameters alone. This is the primary motivation 
for separately treating these two problems. 

3.4 Asymptotic Properties of Noise Covariance and 

System State Maximum Likelihood Estimators 

In Section 3.2 the asymptotic properties of a restricted 
set of maximum likelihood estimators were given, namely that 
class of estimators for which the measurements were indepen- 
dent and identically distributed. Now the asymptotic properties 
of four maximum likelihood estimators' that do not fit in the 
above category are sought. 

1) noise covariance estimation with no a priori 
information 

2) noise covariance estimation with a priori information 

3) noise covariance and system state estimation with 
no a priori information 
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4 ) noise covariance and system state estimation with 


a priori information 


As will be shown, if the asymptotic properties of the 
first of the above estimators are found, the properties of 
the other three follow immediately. Therefore, the asymptotic 
properties of the noise covariance estimator with no a priori 
information will be found first. 

The maximum likelihood estimate of R and Q was defined 
as the solution of (3.3.7). Define the single measurement 
score 


s D (z i ,?) = 


1 -1-1 T -1 *^i 

|Tr[(B. -B. Az.Az.B. ) - 3 - 


2 bT^-Az. 

1 1 35 3 1 

(3.4.1) 


(3.4.1) differs from the single measurement score of Section 3.2 
because it is a function of all measurements up to and includ- 
ing the i th measurement. Define the total measurement score 


n 


S n<V 5 > “ I S j <Z.,C> 

i=l 

(3.4.2) 

and [J(5 0 ,5)] jk = e [S j (Z ± ,5) S k (Z i ,C) | S Q 1 

(3.4.3) 


= J:J S j (Z i , 5 )S k ( 2 i , 5 ) f(z i U o )dz 1 

[J n <«o' 5)ljk = £[S n (Z n- 5)S n (Z n' 5) l5 0 ] 

= // S n<V 5 > S n ( V 5) f<Z nl«o )dZ n 
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J(? Q ) = J(? 0 /£ q ) the single measurement conditional 

information - matrix 

j (r ) = j (E ,E ) the total measurement conditional 
n o n o' o 

information matrix 

Then the likelihood equations (3.3.6) become 
3 L . r— ■ • 

— £=S^(Z ,£) = ) S D (Z.,S) (3.4.5) 

3 n n i 

^ i= l 

It can be shown that when E, = K Q , the true value of the 
parameters, the measurement residuals Az^ are zero mean normal 
variables with covariance B ± , with the further property that 
the residuals at different times are independent. Or 


e(Az.|? o ) = 0 


e (Az.Az^U 0 ) = B.U 0 ) 6 ±1 


It can also be shown that 


^ m 

e[Tr(B7 1 Az. i ^~ 1 hT) I? ] = 0 

X X X 


Tr(B-^z.- ^ l ^ H[) Tr(B- 1 Az 1 ^i^ H^)U 0 ] = Tr ^ 

o c, J o c, J 


where 


A ^ m 

ok _ c [ ax ili^i 3x i|i-l |g 1 
’ii-l £ L 9 rj 8 r* 
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0 


and a [ir [ (B^-eT 1 ^^: 1 ) ^4] Tr( B - 1 Az 1 H*> 1 5 Q ] = 


Therefore, after algebraic manipulation it can be shown that 


e[S J (Z ± ,5 o ) |5 q ] = 0 


(3.4.6) 


3 B 3 B 

e[S j (Z i ,C 0 )S k (Z 1 ,5 o ) |? Q ] = i TrCBT 1 — 4 b: 1 — i)^ (3.4.7) 


1 ac 3 1 a? 


i „ r k' ll 


+ Tr (b7^"H . G"? 3 ^ . ,hT) 6 . n 
1 1 11-1 1 ll 


From (3.4.7) it can be seen that S(Z^,£ q ) is independent of 
S (Z x , 5 q ) for i 7 ^ 1. Then it follows immediately that 


e [S J (Z ,? ) 5 ] = 0 

n' n'^o ^o 


(3.4.8) 


v i t— » f , 8B. , 3B. 

^n'y S n (! n' E o>ly " 2 I [ Tr(B i B I 

+ 2 Tr(B- 1 H i G^ 1 . 1 H^)j 


) (3.4.9) 


(3.4.7) and (3.4.9) represent respectively the single and 
total measurement conditional information matrices. 

Because of the independence of the measurement residuals 
when ^ and the other relationships shown above, the 

asymptotic properties of the maximum likelihood noise covari- 
ance estimator can be found relatively easily. These properties 
are quite similar to those mentioned in Section 3.2 even though 
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ini 


the measurements are not now identically distributed. 

Asymptotic Distribution of the S core 

Suppose (z^,..,z ) is a sample from the probability 

density function f (z i | Z i _ 1 , E Q ) ♦ Let f(z i |z i _ 1 ,5) possess 

finite first derivatives with respect to E in the range ft. 

Then if J (£,£) is positive definite for E in ft. S (Z ,E ) 
n n n' o 

is asymptotically distributed for large n as a zero mean 

normal random variable with covariance J (E ) . 

n ^o 

Proof: It has alreadv been shown that S (Z , E ) is a 

n n' o 

zero mean random variable with covariance J (£ ) . Now all 

n o 

that remains to show is that S n is asymptotically normally 
distributed. From the definition of S (Z ,E ), 

n 

S n<W ’ I S ' Z i'«o> 
i=l 

It was shown that SCZ^Eq) was independent of S(Z lf E Q ) for 

i ^ lo If it is assumed that no term dominates the above sum 

by having a large value with appreciable probability, then 

by use of the central limit theorem concerning the sum of 

independent random variables, the score S (Z . E ) can be 

n n o 

shown to be asymptotically normally distributed for large n. 

Convergence of the Maximum Likelihood Estimator 
Suppose (z^ f . of z ) is a sample from the probability 
density function f ( z ± | Z i _ 1 , £ q ) . Let f(z i |z._ 1 ,E) possess 
finite first derivatives with respect to E in ft. Let S^Z^E) 
be a continuous function of E in ft for all values of Z. 
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except possibly for a set of zero probability. If as n + 

'■W 1 ' 1 * 0 

then there exists a sequence of solutions of 

S n (Z n' 5) = 0 ( 3 o 4 o 10) 

which converges in probability to C 0 ° If for n ^ some n Q 

A 

the solution is a unique vector % , the sequence of vectors 
converges in probability to as n -*■ 00 » 

Proof: Define 

A?(5 0 ,S) = e[S j (Z i ,?) |5 q ] 

= J..J S j (Z i ,C) f (z i | 5 q ) dZ ± 

n 

a3 '5c'?> = 5 I A l<So'5> 

i=l 

Then — (Z , £) is the mean of a sample of size n from a 
n n n ________ 

population having mean A-'(5 Q ;5) if is the true value of E, 0 
From the weak law of large numbers , — converges in 
probability to A 3 (£ ,?). Without loss of generality,, define 
ft' to be (? q - 6, 5 0 +<S) with 6 > 0. It can be shown that A^(£ o ,£) 
is monotonically decreasing over this interval , and since 

Ahs 0 ,5 0 > = o, 
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a 3 (S 0 , 5 0 - {) > o 
A j (C 0 / e o +6) < 0 

Therefore there exists an n(6,e) so that the probability 
exceeds 1 - e that both of the following inequalities hold 
for any n > n(<5,e) if £ is the true value of £. 

ir S n ( V« )>0 if 5 = 5 0 - S 

H S n ( V « )<0 if « - «o +5 

Since S-kz^,£) is continuous in % over Q for all Z^ except 
for a set of probability zero, a similar statement holds for 
n sj^(Z , £). Therefore, for any fixed n > n(6,e) for some 
in fl ' , 


p[i S^(Z ,£) = Ok ] > 1 - £ 

n n 1 ^o 

This is equivalent to the statement that a sequence of roots 
of (3.4.10) exists which converges in probability to In 

A 

particular if (3-. 4.10) has a unique solution for n = n Q , 

/s 

n Q + 1,.., for some integer n Q , then the sequence £ , n > n Q , 
converges in probability to 

Asymptotic Distribution of the Maximum Likelihood Estimator 
If (z^,..,z ) is a sample from the probability density 
function f (z^ | Z^_ lf £ 0 ) where f(z i jz^_ 1 ,C) possesses finite 
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first and second derivatives with respect to £ in the range £2, 
and if the maximum likelihood estimator satisfying (3.4.10) 
is unique for n some n Q# then it is asymptotically normally 
distributed for large n with mean and covaraince t J n (5 0 )] 
Proof: First it will be shown that 




S n<V ? o> 


(3.4.11) 


with large probability. This will then be used to show that 

A 

is an efficient estimator and the asymptotic distribution 


'n 


of ( £ -£ ) is normal with zero mean and covariance [J n (£ o )] 
Since £ n satisfies the likelihood equation 


-1 


S J (Z ) = 0 
n n'^n 


then by a Taylor series expansion of at 


9S J \ , ^ 

S J (Z ,E, ) = 0 - S J (Z ,£ ) +1 — £ A£ K + =■ 
n K n'^n' n' n'^o' l g -k j ^o 2 

^o 


9 2 S j \ , . 

k n ~ l 


where 


A5 = £ - 5 

o n o 


(3.4.12) 


Here as elsewhere index summation notation is used. If an 
index appears more than once on the right side of an equation 
with no comparable index on the left side, a summation over 


that index is implied. 


Define 


4'<w - gll 

+ lP 2s n I 

2 ' 1 


a?; 
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r 

Then (3.4.12) becomes 

0 = S T (Z ,5 ) + C (Z ,£ )A£ 
n n' o n n' o o 

Assuming that C n is of full rank, 

«o - - "VW 1 " 1 S n (Z n’ 5 o> 

Define b n = 

Multiplying (3.4.13) by J n (5 Q ) and rearranging terms, 


J (5 ) A5 - S (Z ,£ ) = 
n o o n n' o 


- (b n + z > S n ( V 5 o> 


(3.4.14) 


It will now be shown that b - I with large probability, in 
which case the right hand side of (3.4.14) 0, establishing 

the desired result. 

As before, define 


L n (Z n' 5) = ln f(Z n |U 


and 


n 


= f(zj5> 


n 1 


Now define 


D n<«o'*> 



3 2 f 


n 


9535 



-H 


f (z ,0 
n n * 


9 f (Z ,0 
n n' 

9C95 


f (z ,£ )dz 

n n ^o' n 
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r r 3 f (Z n ,E ) 

) n (5 o' 5 o ) = }j 35 0 3? 0 dZ n 


Then 


Assuming that differentiation with respect to can be 
taken outside the integral 


D n (? o ,5 o ) 3? 0 3? 


; J"J £ n<V 5 o> dZ n 


9C 95 

^o o 


(1) = 0 


9 2 f 9 2 L , / 9 f \ T 9f 

1 n n , 1 n n 


f 9£9£ 9^3? f 2\9? I 

n r 


9S (Z ,£) 


+ S n (Z n' 5) S n<V 5) 


r 8S (Z ,E ) , 

V«o'«o> - ° - ■[ V ° !«„]♦ W 


as (z ,5 ) 

n n o 


I «o] = - J n (5 o> 


9S n (Z n ,? o ) _ V 3S(Z i^o ) 

nZ L ~ 3£ 


As n becomes large, by application of the strong law of large 
numbers, it can be shown that 


3S ( Z , E ) , 3S (Z ,E ) 


n n ' "o 




n o 


I5 0 ] “ - V 5 o> 
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Analogous to the assumption made in Section 3.2, it is 
assumed that 


1 

n 


B 3 L 


n 


9C j ac k 3? 1 


< K 


with large probability as n -* , where K is independent of C 

and n. Since AC -*■ 0, the product 

< - 0 

o 

with large probability. Assuming that for large n, 


2n 


9 S n 

k 1 

3C 35 


J n<5o» i n K 1 

where is a positive definite matrix independent of n, then 

C (Z ,£)-*■- J (£ ) 
n v n'^o n ^o 

and b - I with large probability 

Thus it has been shown that 

W ( 5„-5o> - S n (Z n'«o> + 0 (3 - 4 - 15 » 

It has already been shown that S n ( z n /5 Q ) is asymptotically 
distributed as a normal random variable with zero mean and 
covariance J n (C Q ). From this and (3.4.15) it can be concluded 

A 

that (C n -C Q ) i s normally distributed with zero mean and 
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00 


covariance [J (£ ) ] 1 as n ->- 
n o 

Wilks has shown that (3.4.15) is a necessary and 

A 

sufficient condition for stating that £ n is an asymptotically 
efficient estimator for 

Thus it has been shown that the maximum likelihood 
estimator for the noise covariance parameters using no a priori 
information about these parameters is: 1) consistent, 

2) asymptotically unbiased, 3) asymptotically normally distri- 
buted, and 4) asymptotically efficient. Now the asymptotic 
properties of the three closely related estimators previously 
mentioned are sought. 

If a priori information about £ is used, the maximum 
likelihood estimator was defined to be the solution of (3 0 3 014),, 


3L A (£,Z ) 
n ' n 

3 5 


* 


3 In f(5) 

35 



(3.4.16) 


The estimator in the absence of a priori information is the 
solution of (3.3,6) . 


| 3L „<5' Z „M 


0 


(3.4.17) 


A A 

where 5^ is the estimator using a priori information and ^ 

is the estimator without using a priori information „ Expand- 

/\ 

ing (3,4.16) in a Taylor series about 


3 L n ( ^V 

35 


/ 9L A (S,Z ) 

n v ^' n 

\ 35 



|3 2 i£ ( 5, z M ~ ~ 

^2 


(3.4.18) 


+ = 0 
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9L A (5,Z ) 
n ' n 


9 In f( 5 ) 

95 


9 2 L A (5,Z ) 
n w n 

9595 


9 L (£,Z ) 
n ' n 

9595 


9 In f(£) 

1 9595 


2 2 
It has been shown that for large n. 


9 L n ( 5 ’ Z n ) 

9595 


,9 L (5 , Z ) , 

UlXJL It 1 
L 95 0 95 0 K oJ 


9 L n ( 5 o'V 

9 ^o 


If - Vf 


9 L ( 5 7 Z ) | 

— kw^l “ - Vf 

^o 


It has already been assumed that as n ■> ®, [J (5 Q ) ] -*■ 0. 

Now the assumption is made that 5 2 is sufficiently close to 
5 q so that the following approximation is valid. 


/ 9 L (5/Z ) \ 
n n | 

9 L (5/Z ) \ 
n w n 

9595 1; 

s 0 

l 9595 j 


= - J (5 ) 
n ^o 


/ 9 2 L A (£,Z ) | i„2, ,, r , 

ml ^ n^ n _ T /r ^ , |3 ln f( 5 ) 

Then 1 9?9£ h ~ J n (? o ) + 9595 


It is also assumed that as n -*■ - J n (5 Q ) dominates in 

(3.4.19) so that 


9 2 L A (£,Z ) 
n n 


- w 
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and then (3.4.18) becomes 




9 In f(5) 

3| 


- J n ( 5o> 


«l-5 2 > " 0 


The first linear correction to the solution ^ ue to inclusion 
of a priori information is then 


A 




+ f J n (? o )] 


9 In f (£) \ 


But 

of 


as n -> <*> , 


9 In f(E)\ 



[J (5 )] ■*■ 

n o 

are finite, 

2 


so assuming that all elements 


?i -*■ c 2 as 00 


Therefore, under a wide set of conditions, the estimator which 
utilizes a priori information behaves asymptotically as the 
estimator which does not utilize this a priori information. 

If the state and noise covariance parameters are estimated 
without a priori information about £ , the maximum likelihood 
estimator was defined to be the solution of (3.3.21) and 
(3.3.23) . Or 


9 In f (x n ,Z n |R,Q) 


9x 


n 


= 0 


VSnln 


M- 


9 In f(x n ,zjR,Q) 
_ 


= 0 


x n^ x n|n 


(3.4.20) 


(3.4.21) 
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The estimator for 5 alone with no a priori information about 5 
was defined to be the solution of (3.3.6) . Or 



3 In f(zjR,Q) 


= 0 


' 5 + 5 . 


(3, 


where is the estimate of 5 found simultaneously with 
x n | n and is the estimate of 5 found independently. 

It can be seen that 


81n f (x ,Z |R,Q) ain f(Z |R,Q) 31n f(x |Z ,R,Q) 

n n i ^ ** 


3 ? 


3? 


35 


(3, 


Expanding (3.4.21) in a Taylor series about ? 2 , 


31n f(x n /Z n |R,Q) 
_ - 


'x -*-x i 
n n n 


3 In f (x , Z | R,Q) 
n n 


35 


5 + 5 - 


x -*x , 
n ^n|n 

5 + 5 -, 


(3. 


8 2 ln f ( x ,Z R,Q) 
n' n ' 




'x +x , 
n n n 


/\ 

(?i-c 2 


5+5. 


But 


3 In f (x n , Z n | R,Q) 

3| 


'x ->-x | 
n n n 


3 In f(xJZ n ,R,Q) 

H 


'x -*x | 
n n n 


5 + 5 - 


5+5. 


and 


3 Z ln f(x n ,Z n |R,Q) 

3535 


x ->x | 
n „n n 


3 In f(ZjR,Q) 


' 5 + 5 . 


5 + 5 - 


3 In f(x n |Z n ,R,Q) 

8535 


x ->x i 
n ^n | n 

5 + 5 0 


4.22) 


4.23) 


4.24) 


) + . . . 
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It has been shown that for large n. 


3 In f(Zj5) 

" 3l3? 


r 8 ln f < Z n l 5 o> 

L 3? 35 


- J (5 ) 
n o 


where J (£ ) is the conditional information matrix. Analogous 
n o 

✓N 

to an assumption previously made, it is assumed that i- s 
sufficiently close to 5 Q so that 


3 In f(Z n |R,Q) 

3535 


- - j (5 ) 

n o 


Then 


3^1n f(x n ,Z n |R,Q) 

3^35 


x n^ x n I n 


- J (5 ) + 
n o 


3 In f(x n |Z n ,R,Q) 

3535 


x n" x n|n 


(3.4.25) 


3 In f(x n |Z n ,R,Q) 

35H 


X ^X I 

n ^n n 


r O F 

= - — [ Tr (P \ ILLS 

2 L r ^ n n 3535 


m . -1 3 P , . 3P i , dx | x 1 

- Tr (P i n n „-l n n, , 0 m _,„-l n n n n, 
n I n Tr Pnln-aT 1 -) + 2 Tr I n Tr sr ) J 


3x i x 


35 n n 35 


n | n 35 35 


Assuming that as n ->■ , - J n (5 Q ) dominates in (3.4.25), 


3 In f(x ,zjR,Q) 
n n 

3535 


- J n ( ^o> 


X -*X I 

n n n 



and then (3.4.24) becomes 


3 In f(xJZ n ,R,Q) 
_ 


A A 


x„->x i 
n A n n 




C’S. 


But 


3 In f(xJZ n ,R,Q) 


'x >x , 
n „n n 


= - ifTr (P _ | - 8P g-l n )1 

2L 1 n|n 9£ 'J 




The first linear correction to the solution E,^ due to 
simultaneously estimating the state is then 


h - h - j'W 1 ' 1 [ Tr(p ;j„ 

But as n -> 00 . [J (E, ) ] ^ 0, so assuming that |Tr(P - f 

no l n | n 

remains finite. 



?i ■* as n ^ 00 

Therefore, the estimator of E, when the state is also estimated 
behaves asymptotically as the estimator which does not 
simultaneously estimate the state. As was shown, the estima- 
tor of E, alone converges to the true value of £ so that the 
state estimator which then uses this estimated value of E, 
converges to the true maximum likelihood state estimator 
discussed in Chapter 2. 

Using similar arguments, the inclusion of a priori 
information about E, in the simultaneous state and noise 
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covariance parameter estimator does not affect its asymptotic 
properties „ 

3 . 5 S election of the A Priori Noise Covariance Distribution 

The choice of f (R) and f (Q) is somewhat arbitrary as 
these functions are introduced so that uncertainty in know- 
ledge of R and Q can be properly treated. However, once 
selected, they can strongly influence the solutions of the 
likelihood equations. They must be selected to realistically 
represent possible variations in the values of R and Q while 
not being mathematically intractable. Caution should be 
observed in their selection because the simplest and seemingly 
realistic distributions may be unsuited for use in a maximum 
likelihood estimator. 

Suppose that f (R) or f (Q) is defined to be nonzero only 
over some finite range of R or Q and is zero outside this 
range. Then all solutions of the likelihood equations for 
R and Q must also lie within this range. This can be seen by 
considering the following example. 

Let f(z|£) be the conditional probability density function 
of a random variable z, assumed to be normally distributed 
with zero mean and variance £ . Let f(£) be the a priori 
probability density function of E , , defined over some finite 
range 


f(£) = fU*) < 5 < ? 1 


= o 


otherwise 
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By application of Bayes' rule 


y 

f (C| z) = 


1 f (z 1 5) f(£) as 

o 

For any finite value of £, f(z|£) is zero only at z = ±~, 
and it is assumed that f(5) is selected so that f(z) is also 
zero only at z = ±°°. Then from the above it can be seen that 
f(£|z) is zero outside the range (£ ,?•]_). This says that 
regardless of the shape of f(£|z) within the range > 

there can be no legitimate solutions of 

= 0 

outside this range. If the range is too small and happens 
to exclude the true value of £, the maximum likelihood equa- 
tions cannot have a valid solution for the true value of £. 

So if f(R) and f (Q) are defined only over some finite or 
semi-infinite range of R or Q, this range must be large 
enough to include all possible true values of R and Q. 

Since the diagonal elements of R and Q represent 
variances, it is clear that the a priori probability density 
functions for these quantities must be zero for all negative 
values of the diagonal elements. From the preceding discussion 
it can be seen that all solutions of the likelihood equations 

y\ . • A • ■ 

for R- 1 - 1 and Cr 3 must be positive, 
n n 



where 


f (z) 


-X 


ill) fjj) 

f (z) 
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Perhaps the simplest possible distribution for R and Q 
is a rectangular distribution for any diagonal element, 
denoted by ? . 


f (?) = 


? l" ? o 


± ? ± ? 1' ? o > 0 


( 3 . 5 . 1 ) 


= 0 


otherwise 


It can be seen that this distribution does not possess finite 
derivatives with respect to ? for any value of ? . The deriv- 
atives are either zero or infinite. Therefore 


3f (?lz) 
3? 


i r 3f (z ?) 

fliT L 3? 


f (?.) 


+ f(z|?) 


3f (?) 
3? 


_ f(?) 3f (zlC) 

FTzT t? 


? * ?o or 5 1 


This says that if ? < ? < ?^, then the maximum of f(?|z) 

occurs at the same point as the maximum of f(z|?) and that no 
valid maximum can exist outside the range (? ,?^) . The solu- 

A 

tion for ? in this case would be identical to the solution 
obtained by considering that no a priori information about 
the value of E, exists, as long as such a solution is within 
the range (? Q , ?.,_) • 

This is the distribution that would be used, at least 
in theory, if the only a priori information about ? is that ? 
must be positive. In such a case 
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1 


t 


i mil 1 111 


f (£) = lim 
K-f 00 


^o 


< « < *1 


= 0 


otherwise 


It should be noted that if a rectangular distribution of 
£ is used, then in the absence of any measurements, no unique 
maximum likelihood estimate of £ exists. This is a conse- 
quence of the fact that all values of £ within the range 
(£ ,£^) are equally likely to occur, so that there is no 
preferred value from the viewpoint of maximum likelihood. 

If another estimation criterion is used, there may be a 
preferred value. In the case of a minimum variance estima- 
tion criterion, the mean of the distribution of £ would be 
the minimum variance estimate. 

In many situations more may be known about £ than merely 
that its value lies in some range with equal probability of 
occurence in that range. In such situations a more complex 
f (£) should be assigned. Two possible distributions are 
given below, a truncated normal distribution and a Gamma 
distribution . 


Truncated Normal Distribution 

If £ has a truncated normal distribution, then its 
probability density function is given by 


t({) - K e -V2[(5-p) 2 /a 2 


£ q < £ < £ x (3.5.2) 


= 0 


otherwise 
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where 


K = 


( 2tt ) 


1/2 


a [erf (s.^ - erf (s q ) ] 


s 

o 




S 1 = 


Cl-y 


erf ( ) is the error function 

y is the mean of the untruncated distribution 
2 

a is the variance of the untruncated distribution 



The mean of the truncated distribution is 


5 



5 f(£) dC 


y + Ay 


( 3 . 5 . 3 ) 


115 




where a and y are parameters of the distribution, and a > 0. 
T (a) is the Gamma function. 
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The mean of the distribution is 


S = 



€ f(5) d 5 = y 


and the variance of the distribution is 


(3.5.6) 


r o° 2 

= J o (5-C) 2 f(?) d5 = (3.5.7) 

In Chapter 2, the a priori state estimate was defined 
as the mean of the normal a priori state probability density 
function. Because of the symmetry of the normal density 
function, the mean is located at the point of maximum 
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probability or likelihood. Now the a priori values of R and 
Q must be defined in terms of parameters of their respective 
distributions. The Gamma distribution is not symmetric 
about its mean so that the point of maximum probability occurs 
at a different point than the mean of the distribution. The 
same is true for the truncated normal distribution if the 
points of truncation are not chosen to be equidistant from 
the mean. Because the criterion of maximum likelihood is 
used to define the optimal estimates of the state and noise 
covariance parameters, it would be consistent to define the 
a priori estimates of these quantities as the points of 
maximum likelihood of their respective a priori probability 

A Jr 

density functions. If £ 0 denotes the a priori estimate of 
the k ^* 1 component of E , , then 


K 


k 

o 


k 


y 


for the truncated normal distribution 


€ 



(a k -l) 

k 

a 


k 




for the Gamma distribution 


Actually, if the parameters of the respective distribu- 
tions are defined, there is no need to separately define the 
a priori estimates of E, when solving the likelihood equations. 

The solution is a function of the parameters of the distribu- 

✓\ 

tion, not £ . However, in subsequent sections when approxi- 
mate solutions are discussed, it becomes convenient to intro- 
duce the a priori estimates as separate entities, although 
they will be related to the parameters of their distributions 
as shown above. 
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If a rectangular distribution of £ is selected, then no 
point of maximum likelihood of this distribution exists. In 
this case, the a priori estimate of £ is defined as the mean 
of the rectangular distribution. In fact, any point within 
the nonzero range of the distribution could be selected as 
the a priori estimate without affecting the solution, but for 
the sake of uniqueness, the above definition is made. 

3 . 6 Computation of the Estimate 

The likelihood equations for estimating the state and 
noise covariance parameters with and without the use of 
a priori information have been derived but in general the 
equations are so complicated that solutions cannot be 
obtained in closed form. In this section techniques for a 
numerical solution of the equations are discussed. For 
simplicity, only one of the several possible cases are treated, 
that of simultaneously estimating the state and noise covar- 
iance parameters when a priori information is used. The 
solution of this problem includes all of the features that 
are necessary for the solution of the others, so that only 
slight modification of the discussion below is necessary in 
the other cases. 

The solutions of the augmented likelihood equations 

/ 9 L A ( a , Z ) 
n ' n 


91n f (a | Z n ) 
”” 9a 


= 0 


a 


n 
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are sought. A general method of solution would be to assume 
a trial solution and derive linear equations for small addi- 
tive corrections. This process can be repeated until the 

A 

corrections become negligible. a If a is the trial value of 

3L A ° 

the estimate, then expanding ^ - in a Taylor series and 

d 06 

A A 

retaining only the first power of Aa Q = a n - a Q , leads to 


31 / 

3 a 


A 

J 

n 


a 


9L 
n 

9a 


+ 


n 


a 


3 2 L A 

n 

3a 2 


Aa =0 
o 


(3.6.1) 


a 


Assuming that 


3 2 L A 

n 


is of full rank, the first linear 


9a' 


a 


correction to a Q is 


Aa = - 
o 


1 9*1*1 

3L, A \ 

n 

n 

3a 2 1- 

l 3a / 


a 


(3.6.2) 


a 


The next trial value is then a + Aa_ . 

o o 


3 2 L A 


Clearly this method has several drawbacks. Computation of 


-£• and its inverse is very complicated, and once a stable 


3a 

solution is found, another computation, the conditional infor- 
mation matrix, must be performed before any evaluation of the 
performance of the estimator can be undertaken. A mechaniza- 
tion introduced by Rao eliminates these drawbacks. It is 
quite similar to the above method but employs one approxima- 
tion which greatly reduces the number of computations. For 
this iterative solution, the approximation is made 


3 2 L A 

n 

\ 3a 2 


= - J (a ) 
n o 


(3.6.3) 


a 
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where J (a ) is the augmented conditional information matrix 
no 

defined by 



(a o ) 


e 





T 


9a 





Thus the additive correction Aa becomes 

o 


Aa 

o 


r - r A~ .,-1 
J n a o 



( 3 . 6 . 4 ) 


In large samples with a given a = a Q , the difference between 


9 2 L A 


A V ^ 

— L and - J A (a ) will be of order 1/n, so that the above 
n o 

0i O 


18£ 2 / 

approximation holds to first order of small quantities. 


When a stable solution of a is obtained, the asymptotic 

n 

estimation error is zero mean normally distributed with 

A — 1 

conditional covariance [J (a)] which is closely approximated 

n 


A ~ -1 

by the computed 

In this method the main difficulty is the computation 
and inversion of the information matrix at each stage of the 
iteration. In practice this is found to be unnecessary,. The 
information matrix can be kept fixed after some stage and only 
the score recalculated. At the final stage when stable 
values are reached, the information matrix can be recomputed 
at the estimate value to obtain the covariance of the estima- 
tion error e 

Whenever an iterative solution to a set of nonlinear 
equations is proposed, there is always a question of conver- 
gence. This question is reasonably well resolved in the case 
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of the likelihood equations. Deutsch discusses this problem 
and references several other works on the subject. The results 
of his discussion are given below. 


If a is selected as the initial estimate of the solution 
o 

a th 

of the likelihood equations, if cu is the j - iteration value 


of the estimate, and if a is the "true" maximum likelihood 

/N A 

estimate, then the iteration process converges if |oij-a| 
decreases as j increases and tends to zero as j -*■ 00 . The 
iteration process is defined as follows: Let g(a) be a 

differentiable function which has no zero in the neighborhood 

/N 

of the root a for the likelihood equation. The existence of 

ss 

a is postulated. Define 


h (a) 


a - g (a) 


da 


In L 


where L is a likelihood function. The general iteration 
process is then 


If 


a . , , = [h (a) ] 

3+1 a=a j 


= Oj - g(«j) m 


L] ~ 

a=a. 


/V A 



J_ T_ 

is the estimation error at the j iteration, then g(a) must 
be chosen such that £j + ^ < £j and -* 0 as j -*■ °°. This 
condition assures the convergence of the iteration process 
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''I 

A 

to the value a. By using the asymptotic properties of the 
maximum likelihood estimator for large sample sizes, the two 
previously given iterative techniques for the computation of 
the estimate can be shown to be convergent. 


3.7 Computation of the Information Matrix 

By calculation of the information matrix, the asymptotic 

covariance of the maximum likelihood estimate can be obtained. 

A -1 

Care must be taken to distinguish between [J n (a Q )] and 

[J A (a )] -1 , the former being the conditional covariance of 
no' 

the estimate for a given value of a Q , the latter being the 
average conditonal covariance of the estimate, averaged over 
the ensemble of all possible true values of a Q . 


j A (a ) — e 
n o 


r /3L \ 9L , 

'o' o 

_ co 

J n<“o > -J 0 f( “o> d “o 


(3.7.1) 


(3.7.2) 


where 


L A = L A (a ,Z ) 
n non 


[J^a )] 1 is a highly nonlinear function of a , so the 
no o 

average conditional covariance cannot be explicitly calculated. 

Fortunately, F (a ) 1 is not needed in finding a , but is 

a n o n 

only used in evaluation of the estimator performance over the 

~A -T 

ensemble of all possible a Q . To find J n ( a 0 ) some numerical 
evaluation of (3.7.2) is necessary. 

From (3.3.37) and (3.3.20), 
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(3.7.3) 


3L A 

n 

9x 

n 


^ T -1 
- (x - X I ) P l 

n n n n n 


From (3.3.37) and (3.3.24) 


9L l I r l m 9P I 1 9x I 1 

—5 = - i Tr (p-j -p-{ Ax AxVj ) — 2-lii - 2 p"} Ax 

r, r i 2 L n n n n n n n n ~ r -i n n n 3£ J 


35 


u- 


+ 91n f(£ 3 ) 


u- 


+ y Tr [(b7 1 -b7 1 Az.AzTb. 1 ) — 2 b7 1 Az . — — U - H? ] 1 
L, L i i i i i 9? d i 1 3C 3 1 ’ 


(3.7.4) 


where 


Ax = x - x 
n n n n 


Az. = z . -H.x.i. . 

i l ii i-l 


Then it follows that 


I3L \ dlT , _ 

.33T ST l“oJ“ ^nln^o* 1 <3 - 7 ' 5 > 

1 n ' n 1 


Using the same procedures as in obtaining (3.4.9), after 
algebraic manipulation, it can be shown that 


3L A 

3L A 

n 

_ n 

35 j 

3£ k 


i 3 P | . 3P | . . 

,-l _nln p -l _njji + 2 p -l G Jk 

" 1 " j n|n„ r k n|nn|n 


n 


35‘ 

3B . 


35 J 


3B. 


+ y Tr (B . 1 i B7 1 — i + 2 b7 1 H.G? 1 ? ; . hT)1 

i i 9? k l i i|i-l i'J 


, 3 In f(5 J ) 3 In f(C) 

+ ~j 77K 


(3.7.6) 


35- 


35 J 
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where 





It can also be shown that 



(3.7.7) 


If the diagonal elements of R and Q (5) are mutually 
independent and are distributed with a truncated normal 
distribution, then 


3 In f(g k ) = _ (5 k ~ U k ) (3 7 8 

~ r k n 2 

35 a k 

where £ k represents the appropriate element of R or Q and 
y k and are the mean and variance of the corresponding 
untruncated normal distribution. 

If the diagonal elements of R and Q are distributed with 
a Gamma distribution, then 


3 In f(5 k ) a k - 1 _ a k ,, 

35 k_ = " 7 

where a k and y k are parameters of the corresponding Gamma 
distribution. 

All of the necessary quantities appearing in (3.7.6) 
can be computed using recursive relationships. 
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( 3 . 7 . 10 ) 


P . = ( I — A H ) P | n (I - A H ) T + A RA T 
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where 
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-i U i 1 - H T - A ^-r R 1 
3^ 3 n n 3 ^ I 
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The proper initial conditions for these recursive 
relationships are: 


3 P | 3P | 

o | o _ o | o _ ~;jk _ 


3R 3 3 3Q 3 3 


= G J ? = 0 
o o 


1 . 7 . 11 ) 

1 . 7 . 12 ) 

. 7 . 13 ) 
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A 

J n (a Q ) can be partitioned into submatrices, corresponding 


to x n and £. 


where 


J^(ct ) = 

n o 


? n|n 

0 




A T a 

9L~ I 9L n 1 


w 


0 

-1 

n 


Then 




n n 


W 


n 


and 


where 


and 


-A, v-1 

J (a ) 
n o 


n n 


W 


n 


P „|n - / P n|n ^ f <«o> d5 < 

5T = / M n (E 0 ) ((£„) d{ 0 


Neither P i n nor W n can be computed analytically. A 


first order approximation to ? n | n and W n could be computed 
by expanding P i and W about X • 


n n 


pN (5 ) - P 1 } (£) 

n|n^o n|n vs/ 


n 



/ap 13 1 

fn<«> + l 

[ n n 

i a? 1 


x 45 o + 1 < ( <3-7.19) 


W n (i o> ” W n J((> + 


tT\ 


3W 




i g 

I AC + ^ AC T 2- 

9C /-^ s o 2 ^o \ 


A?, 


( 3 . 7 o 20 ) 
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^ II 

where 


But 


A 5o - - 5 

E({ 0 ) = X 


and 


E(A? o A^) = cov(£ 0 ) 


where £ is the mean of the a priori distribution of true 
parameter values £ o and cov(£ Q ) is the covariance of this 
distribution. Then 



It is obvious that extensive computation is necessary to 
compute these quantities so that this technique is not particu- 
larly attractive. 

An alternate method of evaluating P i and W~ would be 
to select a sample of £ chosen from the distribution f(£) and 
then employ the approximations 

K 

E p n|n ( S> 

j=l 

K 

W~ - i y W (C . ) 
n K n 3 

j=l 

Of course, the sample size K must be sufficiently large to 
ensure that this approximation is reasonably good. 
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The simplest approximation to make would be 


W n “ W n«> 

This approximation may be adequate in applications where the 
range of £, is limited, but caution should be employed in its 
use. 
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Chapter 4 

SUBOPTIMAL SOLUTIONS OF THE ESTIMATION PROBLEM 

4 . 1 Introduction 

An exact or iterative solution of the likelihood equations 
of Chapter 3 requires extensive computation as the solution 
is generally found only after several passes over the measure- 
ment data. In many applications such computation is not 
feasible or a "real time" solution is needed. In such situa- 
tions, approximate solutions are necessary, either to reduce 
the required computation and/or to obtain a real time solution 
of the parameter estimation problem. As would be expected, 
the quality of the estimator is degraded in such cases, but 
often the degradation is not serious. However, there are 
certain special cases when some of the approximate solutions 
are nor unique or are so highly biased that their use is 
questionable. 

This chapter deals with the derivation and evaluation 
of several suboptimal approximate solutions. Also included 
is a summary of possible parameter estimators suggested by 
other authors. The list of approximate solutions is not 
exhaustive but is meant to illustrate several techniques that 
are available to obtain an adequate solution of the problem. 

4 . 2 Linearized Maximum Likelihood Solution 

The iterative solution of the maximum likelihood equations 
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of Chapter 3 was based upon successive relinearization of the 
maximum likelihood equations about trial values of the para- 
meters obtained from the previous iteration, continuing the 
process until convergence. If the initial trial value of the 
parameter is "sufficiently close" to the true value, a single 
correction to the initial estimate based upon a linear approxi- 
mation to the equations is often adequate for the solution. 

This single linearization is the basis of the linearized 
maximum likelihood solution. 

As in Chapter 3, the solution of 



9a 


a 


0 


is sought. If a Q is the trial (a priori) value of the esti- 
mate, then from (3.6.4) the linearized maximum likelihood 
solution a^ is found from the equation 





T 


^ a ^ - 1 

a + [J (a ) ] X 
o n o 


dV 

i 

TcT 


n 


(4.2.1) 


The linearized solution a n can be found as long as J (a ) is 

£ , 3lA n o 


of full rank. Both J (a ) and 

n o 


n 


9a 


can be evaluated in 


a. 


real time since they represent the conditional information 

matrix and the score evaluated at the a priori estimate of 

the parameter a. The conditional information matrix J^(a o ) 

is expressible as a linear combination of the conditional 

A ~ 

information matrix at the previous time, J n _A a 0 ) / an< ^ a 
term which represents the additional information about the 
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parameters contained in the measurement at time n. Similarly 
SLA' 


the score 


n 


3a 


is expressible as a linear combination of 


a. 


K-i 


3a 


, and a term which 


a. 


the score at the previous time, 
is a function of the measurement at time n, Thus as the 
measurements are taken, the conditonal information matrix 
and the score can be computed as running sums, and the 
linearized solution (4„2.1) can be found in real time* 


3L n 

Because — is a highly nonlinear function of a, there 

da -*■ 

is no simple way to determine when the above linearizing 
approximation is valid , or more importantly, when the linear- 
ized solution is "closer" to the true value than the a priori 
estimate. Several measures can be used to determine if the 
linearized solution is closer to the true solution. If the 
linearized solution is valid, the following inequality should 
be satisfied„ 


9 


3 Ii A , . 

1 8lA 1 

T 



r 3 ij A . 

[ 3 L A 1 

T-i 

IjcT 

(srl 


Oto 

< 

JS~ J n a 1 

Iwrl 



If this is not satisfied, another trial value of a must be 

o 

found and the procedure repeated. Evaluation of this measure 
requires a recomputation of the score and the conditional 

A 

information matrix at the value a = a^, so in this sense the 
linearized solution is not real time. However, numerical 
results indicate that this linearized solution converges over 

A 

a wide range of a Q so that in many applications this check is 
not necessary. 

The asymptotic conditional covariance of the linearized 

A ^ —1 

solution is approximately £J n ( a 0 )J ° A better approximation 
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can be obtained if computational capacity allows evaluation 

of [J A (a )] _1 . 
n a 

If it is known that there may be a significant error in 
the a priori estimate of a, then use of the linearized tech- 
nique may be questionable. However, in this situation a 
combination of an iterative solution plus a linearized solution 
could be used. Sufficient measurements are taken to obtain a 
relatively good estimate of a by use of the iterative proce- 
dures of Chapter 3. Subsequently, the linearized solution 
is employed, using the results of the iterative procedure 
as the point about which to linearize. 

A third procedure, sequential relinearization, could also 
be used. It is quite similar to the linearized solution 
except at regular intervals of time, which may encompass 
several measurement times, the best linearized estimate of a 
is used to compute subsequent values of the information matrix 
and the score. At each relinearization, the score must be 
corrected to account for having used a different value of a 

✓v 

in its computation than the newly obtained value. Let a 1 be 
the estimate of a that was obtained at the previous relineari- 
zation and used from then until the present in the computation 

A 

of the score, and let a 2 be the current linearized estimate. 

A 

Expanding the score in a Taylor series about a^. 


9L A \ 

/9L A 

[9 2 L A \ 

n 

n + 

n 

9a h 

1 9a 

,9a9a , 

'a ^ 
2 

' 1 Cl 

a l 
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Using the approximation 


^nl 

9a3a " 


r 3 2 L A 
n 

9a9a 




(a l } 


the corrected score is given by 


3L 

r 

,9a 


n 


a. 



J n^ a l^ ^ a 2~ a l^ 


As with the linearized solution, this procedure should be used 
only after a sufficiently accurate estimate of a is obtained, 
either from the a priori estimate or through use of the 
iterative procedure. 


4 . 3 Near Maximum Likelihood Solutio n 

By a suitable approximation to (3.3.38) a "near maximum 
likelihood" solution can be found which reduces the necessary 
computations considerably. In this solution, the state esti- 
mate is defined to be the maximum likelihood estimate which 
uses the near maximum likelihood estimates of R and Q (£) 
to compute the filter gains, and estimates of E, are found 
from the solution of the "pseudo" likelihood equations: 


9 A 


n 


- 1 I Tr 

i=l 


— 1 — 1 T — 1 

(B, -B. Az.AzTB. ) 

l l ill 


3B 


35 


fl 


+ 9 In f ( € 3 ) = 0 

35 j 


(4.3.1) 


where A is the "pseudo" likelihood function defined by 


n 


(4.3.1). This equation is obtained from (3.3.38) by retaining 
only the most significant terms. The savings in computation 
arise from not having to compute - — appearing in the 
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• 3c"i 

likelihood function and G.t. ^ appearing in the expression 
for the conditional information matrix (3.7.6). — — is 


]r n 

an array with gx(y+n) elements and is an array with 

2 2 

6 x(y+n) elements. If all of the symmetry properties of 
k *i 

are utilized, the number of independent elements is 
I* - 1 . - x " Y+n ^ L . jf the state, driving noise, and 

measurement are of moderate dimension, the number of compu- 
tations involved in calculating these quantities can be 
considerable, so that not having to perform these calculations 
can. result in a significant saving in computer time. 

If convergence of (4.3.1) to a unique solution is obtained, 
the asymptotic distribution of x n | n and ? n are approximately 
normal with conditional covariances 


9? 


e [ (x-x | ) (x-x | ) ] = P | (?) 

n n n n nn nn 


e[(?-? n ) (?-? n ) ] = 



3 A 1 

T 3A 1 

e [ 

n 

(9? 

— 1?] 

3? 1 


~ 1 = J 1 (?) 
n 


The conditional information matrix J (?) is not the same as 
the information matrix of Chapter 3 because of the omitted 
terms in the likelihood function. Here 


n 


SB. , 3B . 


[J (5)] kj = + 9 in fiC ). ain (4 . 3 . 2) 

L 1 9 ^k i j 5^3 


A comparison of (4.3.2) with (3.7.6) will show that the above 
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information matrix is smaller than the information matrix of 
Chapter 3. Thus, as would be expected, the covariance of 
the parameter estimates will be larger when the pseudo likeli- 
hood equations are used than when the full likelihood equations 
are solved. 

Numerical results indicate that the iterative solution 
of the pseudo likelihood equations when the information 
matrix (4,3.2) is used as an approximation to the negative 
gradient of the likelihood equations may present difficulties. 
This is because in some circumstances J n given above may be 
nearly singular and using its inverse in the solution may 
result in an unstable iterative procedure. However, these 
same numerical results show that the pseudo likelihood equa- 
tions do have a unique solution, but they must be found using 
other techniques in the iteration algorithm, say a fixed 
step size sweep looking for zeros of the pseudo likelihood 
equations , 

4 o 4 Explicit Suboptimal Solutio ns 

In this section, explicit "real time" solutions for the 
estimates of R and Q are sought. As will be shown, such 
estimates are approximations to the maximum likelihood solutions 
and on any given trial may be highly biased. However, if the 


*" a " - po s itive definite matrix A is said to be smaller than 
another positive definite matrix B if the matrix (A-B) is 
negative semi -definite. 
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a priori estimates of R and Q are sufficiently close to the 
true values, such estimators will provide reasonable estimates 
with considerably less computation than the estimators pre- 
viously discussed. Even if the estimates are biased, they 
provide useful information. If the estimates differ consis- 
tently and significantly from the assumed a priori values, 
then there is good reason to doubt the accuracy of the 
a priori values, even though the biased estimates do not 
necessarily represent better estimates of R and Q. In other 
words, the expliciAestimates will indicate if there is a 
significant error in Vhe a priori values of R and Q even if 
they do not tell how to correct this error. In this sense 
their use is related to testing a hypothesis on the values 
of R and Q as discussed in Chapter 5. 

These approximate estimators are obtained as approximate 
solutions of the pseudo likelihood equations (4.3.1). 


-1-4 = - i V Tr [(B. 1 -B. 1 Az.Az T B. 1 ) i] + = 

dE, 3 2 1 1 111 g^3 J 9^3 


The last term allows introduction of a distribution function 
of R and Q so that a priori estimates can be weighted with 
the estimates derived from the measurements alone. For this 
approximate solution it is convenient to form estimates of R 
and Q which are independent of this distribution function, 
and after such estimates are obtained, then the a priori 
estimates and their associated covariances are considered in 
obtaining a combined estimate for R and Q. Thus, initially, 
the solutions of the following equation are sought. 
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where 


I 

i =1 


Tr (A bT 1 

i 



0 



-1 T - 

B. Az. AzT B. 

l ill 


(4.4.1) 


Using the results of Appendix A, (4.4.1) becomes 


n 


i=l 


V [(ABT 1 )^ + Tr(AB~ 1 H. ^ 1 4- - H^) ] = 

1 11 ZR 33 1 


n 


Y Tr (AB? 1 H i 


h t ) = 0 


i=l 


9Q 


33 


(4.4.2) 

(4.4.3) 


As the equations stand, no explicit solution for estimates 
of R and Q is possible, so further approximations must be made. 
When these approximations are made, there is a real question 
of existence of independent solutions of the resulting equations 
for the unknown elements of R and Q. Even if there are suffi- 
cient independent equations, there is no general way to obtain 
a closed form solution of the nonlinear relationships. If R 
or Q is to be estimated separately, there is no difficulty in 
obtaining a reasonable solution to the problem. Unfortunately 
the question of simultaneous estimation of these quantities 
from the above equations is not well resolved. The solutions 
given below represent separate estimation of R with Q known 
and estimation of Q with R known. The two solutions can be 
used, with caution, to simultaneously estimate R and Q, 
realizing beforehand that the resulting estimates are not 
independent. This dependency can result in biased estimates 
which fail to distinguish between errors in R and Q. However, 
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as mentioned previously, some useful information can be 
derived from such biased estimates. 

It can be shown that for many applications 


^ P i I i-1 T 
H, — H7 << I 

3R- 


so that (4.4.2) becomes 
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I (ET 1 - B' 1 Az. Az* B' 1 ) 
i=l 


33 _ 


= 0 


(4.4.4) 


But 


-1 -1 -1 T -1 

B. = R - R H.P.i. HTR 

l llll 
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T -1 


x. i . = x. | . , + P . | . HTR ( z . — H.x. . . 
ii li-l ill l ii i-l 


it can be seen that 


z . 
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H i x i|i = z i 


H . x . 

l l 


. n - H.P. 
l-l 1 l 


.hTr 1 ( z . 

ii l 


H.x 
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i-l 


) 


Defining 


then 


.T -1 


- (I - Viii H i E > (z i - H i x i|i-i> 


E(R- 1 - R- 1 H i P 1 | i H? : R- ] ') AZ. 


R B. 1 Az. 
l l 


Az! = z. - H.x. | . 

l l ill 


B . ^ Az. = R - "*" Az! 

ii l 
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and (4.4.4) becomes 


n 

/ [R -1 (R - H.P,.H T - Az! Az. ,T )R -1 ]^ = 0 

L — • 1 1 | 1 1 1 1 

i=l 

n 

Or R jj - ^ ^ (Az!! Az| T + H i P ± | ± hT) 3 3 = 0 (4.4.5) 

i=l " " ! " 


It is still not possible to solve (4.4.5) for R as P^^ and 
Az! are highly nonlinear functions of both R and Q. However, 
if either the a priori values of R and Q or some estimates of 
these quantities are used to compute Azj^ and P^j^, then the 
estimate of R can be defined as 


R 


j j 


n 


1 

n 


n 

(Az! Az! + H.P. | . H7 ) 33 
i l l 1 1 l l 

i=l 



(4.4.6) 


where Azj and P^ | ^ are computed as functions of either the 
a priori estimates of R and Q or some previously obtained 
estimates . 

/\ » * 

A recursive relationship for R -^ 3 can be obtained if 

n 

* -k ^ 

Az' and P„ i are not functions of R or Q . 
n n | n n *n 


A ■ • -I A i ■ 

r” = 2A R U 


n 


n 


i * *ip * 

. , + - (Az ' Az' + H P | H") 

n-1 n n n n n n n 


T, jj 


(4.4.7) 


Equation (4.4.7) is not the only approximate solution that 
could be reasonably obtained from (4.4.4). Rewriting (4.4.4) 

n 

Y [BT 1 (B. - Az . Az?) bT 1 ] 33 = 0 
L „ i i ill 

i=l 
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n 

Or V [BT 1 (R + H.P.i.hT - Az. AzTjBT 1 ]^ = 0 (4.4.8) 

/ k 3L 1 1 I 1 1 1 1 1 

i=l 

If the estimation process has reached a steady state, 
that is, B^ = constant for all i, then an estimate of R can 
be defined by 


n 


R 


33 

n 


= 1 I 

i=l 


* *T * T . i i 

< iz i Az/ - 


(4.4.9) 


"Jc 'Jt 

where Az^ and are equal to Az^ and | computed as 

a function of a priori values of R and Q or some past estimates 
of these quantities. The form of (4.4.9) is not as desirable 

A 

as (4.4.6) because R is not necessarily positive definite. 

n 

If some of the squared residuals are small compared with 

* rn 

H . P . | . . H . , then some of the terms in the above sum can be 

negative. If this occurs often, then the resulting estimate 
of R may have negative diagonal elements. However, the esti- 
mator has the advantage of not being a function of the value 

of R that is used to update Az and P i , at time n. This 

n n | n-1 

can reduce possible bias problems in the feedback estimator 
discussed later. The estimator of the form (4.4.6) is the 
one studied further. 

Obtaining an explicit estimate of Q is not as straight- 
forward as obtaining the estimate of R. There are many 

/s 

approximate solutions to (4.4.3) for Q n depending upon the 
nature of the approximations made. The solution given below 
is but one of several possible solutions, but it is felt that 
it has the advantage of simplicity and wide applicability. 
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By manipulation of (4.4.3) it can be shown that 
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i=l 


Tr (Ab^H. — M rV f) 

1 1 3qD3 1 
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Then (4.4.3) becomes 
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30 T ^ ^ • 

In many applications, F. — ¥-r F. >> r-r, so (4.4.3) becomes 

3 q33 


1 3 Q 33 1 
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I [ r Kii-i (r i Qr I - A v x I: - p i|i + t, i> p l|i-i r i ] 3j = ° < 4 - 4 - io > 


i=l 


Equation (4.4.10) cannot be solved explicitly for Q, so 
additional approximations are necessary. If it is assumed 
that F^ and P^ j are approximately constant for all i, then 
(4.4.10) becomes 


n 


fr^p”} , y (r.QrT - ax.axT - p... + u.) p - } , r = o 

L n n|n-i Z_, 11 11 i|i i' n|n-l nJ 


i=l 


The equation above is satisfied if 


n 

I 

i=l 


(r.QrT - ax.axT - p.,. + u.) = 

11 11 ii i 


( 4 . 4 , 11 ) 


n 


Or 


,33 


- k I [r^ 1 (Ax ± 4xT + Pi|1 - u^r^ 1 ]” = o 
i=l 


If r i 1 does not exist, the generalized inverse of I\ is to be 
used. (See Appendix A for discussion of generalized inverse.) 
In general the dimension of the driving noise vector is less 
than or equal to the dimension of the state, in which case 

T — 1 

(r i r i ) exists and the generalized inverse of I\ is 


< r i> 



T 
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The estimate of Q is defined as 
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j j 
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£ i Y [r7 1 (Ax*Ax* T + P*. 
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( 4 . 4 . 12 ) 


★ * * 

where Ax^, P^|^, and are computed as functions of the 

a priori estimates of R and Q or some past estimates. If 

* * * ~ 

Ax n , P n | n , and U n are not functions of R n or Q n , a recursive 

relationship can be obtained. 
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j j 
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n | n 
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( 4 . 4 . 13 ) 
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Two classes of estimators of the form (4.4.7) and 
(4.4.13) exist depending upon what use is made of past esti- 
mates of R and Q. 

1) no feedback estimators 

2) feedback estimators 

In the no feedback case, a priori values of R and Q are 

* 

used to compute the quantities denoted by a in the estimator 
equations. In the feedback case these quantities are computed 
as functions of past estimates of R and Q. At each stage the 
best available estimates of R and Q are used to update the 
starred quantities. If feedback is employed and the variance 
estimation process converges to the true values of R and 
Q, then the state estimate x n j n will converge in most applica- 
tions to the optimal state estimate that would be obtained 
if the true values of R and Q were known a priori. However, 
using this estimation scheme, convergence is not guaranteed. 

In fact, numerical results indicate that if the a priori 
values of R and Q are significantly in error, the process 
will converge but to biased and incorrect estimates of the 
variance parameters. Techniques for evaluating the perfor- 
mance of the feedback and no feedback estimators are given 
next . 

The two measures which seem appropriate for evaluating 
the performance of the explicit suboptimal estimators are 
the mean and mean square error of the estimates of R and Q. 

In the preceding section, estimators for the diagonal elements 
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of R and Q were developed, resulting in (y+ri) estimator 
equations. The mean square error matrix of such estimates is 
a (y+r|) x (y+n) matrix, which includes the mean of all 
quadratic functions of the errors in each component of the 
diagonal elements of R and Q. Such a matrix is most diffi- 
cult to compute, so for the purposes of this development, only 
the diagonal elements of such a matrix will be considered. 

As mentioned in Chapter 2 , a distinction must be made 
between conditional and unconditional expectation operators. 
The same notation as in that chapter will be used to make 
this distinction. 

First, the performance of the no feedback estimator will 
be discussed. From (4.4.7) 


R^ = R^ + — (Az ' * Az'* T + H P*. H T )^ 

n n n-1 n n n n n n n 


A ~l -i 

The conditional expected value of R^ is 


e(R n j) = *hr e(R n-l } + ^f e(Az n" Az A" T) + H n e(P nln )H nl 
n n n 1 n ^ ri n nnjnnj 


.*T, 


t T 13 j 


This conditional expected value is conditioned upon the fact 
that the a priori estimates R q and Q q are used to compute the 
filter gains while the true values of these covariances are 
R and Q. Averaging is performed over the ensemble of all 
driving and measurement noises as well as all possible initial 
state conditions. 
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^ z n ~ z n H n x n I n - v n n x n | n 
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So e ( Az^ Az^ ) 


rn . r \j'^ ^*T , T 

e (v v ) + H e(xi x i )H 

1 n n' n n[n n|n' n 
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A „ ' P n|n-l H n (E o + H n P n|n-lV 


v*T . 

P n I n ’ e x n 1 n x n| n 


(not P | unless R = R 
' n | n o 

✓\ 

and Q q = Q) 


In the no feedback case, P i is not a random variable under 


n n 


i « 

the expectation operator, so e(P n | n ) = p n j n and 

*< e 33 i> + K <pj3 + 4 F i 3 > 


(4.4.14) 


where 


* rn * *m m 

AF = H (P i +Pi)H -HAR-RA H 
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This can be expressed as 



e(R^ j ) = 


+ F^ 
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where 
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n n 

z 
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* 

If R = 
o 

R and Q q = Q, then P n j 

n ~ 

P l , A R 
n n' n 


* rn 

P i H , and 
n n n' 
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from the definition of AF n it can be seen that AF^ = 0, 
for all i. Then 


e (R^) = 
n 

/\ /\ /N 

If R o R or Q q ? Q, then AF^ ^ 0 and R n is biased, the bias 
equalling F n » 

A 

The unconditional expected value of R n follows from the 
above . 


E(R ::) ) = E(R JJ ) + E(F JJ ) 


33 


» 33 


n 


n 


Here averaging is done over the ensembles mentioned above 
and also over the ensemble of all possible R and Q. 

By definition E(R) = R, where R is the mean of the 
distribution of all possible R values, and 


n 


E<F ) = i V E (AF . ) 
n n l 


But E ( AF . ) 

l 


i=l 


H,E(P.|.)Hf + H.P*i.hT - H.A*E(R) 

1X11 1111 11 


*rp rn 

E (R) A . i HT 
1 1 


E (Pi I i ) can cora P u ted recursively using (2 0 3„43) and (2.3.44) 


E (P . | . ) = (I-A*H.) E (P . | . , ) (I-A*H. ) T + A*RA* T 

ii li li-l li li 




where Q is the mean of the distribution of all possible Q values, 
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Define 


Then 


and 


P. I , = E(P. | . ) 
11 11 


, , T * T * — * T T 

E (AF . ) = H.P.i.HT + H.P.r.HT - H.A.R - RA. H7 

l ll|ll ll|ll 11 li 


E (R^ ^ ) = R 33 + E(F- ,:) ) 
n n 


If the a priori values of R and Q are assumed to be equal to 
the means of their respective distributions, then 


R^ = R and Q = Q 
o o 


and it can be shown that 


P . 

l 


i 




Then 


E (AF^) = 0 

E (R^ ) = R ^ 
n 


A 

Thus R n is an unbiased estimator of R across the ensemble of 

/\ 

all possible R and Q. However, if R q ^ R or Q q ^ Q, then 

✓\ 

E(AF^) ^ 0 and R n is biased, the bias equalling E(F n )„ 

The measures of error of the estimator are chosen to be 
the expected squared deviation of the R estimate from the 
true value, or e [ (R^ - R-^) 2 ] and E[(R^ - R-^) 2 ]„ 


etiR 33 - 
n 


= 


e[(R 3j - s(rR)) 2 ] + [£<rR - R 33 )] 


n 


n 


n 
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/N « . A • « 

[ (R^- 1 - efR^- 1 ); ] can be computed recursively by noting that 


[(R^ j - e(R^ j )) 2 ] = e[(Rj j ) 2 ] - [e(R ^ j )] 2 


The diagonal elements of (4.4.7) are squared and the condi- 
tional expected value then evaluated. Use is made of the fact 

* 

that the residuals Az^' are zero mean normal variables m 
the no feedback case, and the approximation 


e [R^CAz*' 3 ) 2 ] “ e (R n-l } £ t( Az n' j)2] 


is used. It can be shown that as the filter approaches 

✓s /\ 

optimality (R -* R, Q -*■ Q) , the above approximation is 
identically satisfied. Using the above approximation and 
after extensive algebraic manipulation. 




e [ (R 33 - 

e (R 33 

2 

) ) ] 

= G jj 

(4.4.15) 




n 

n 

n 


where 

G jj = 

f n-ll 

2 . . 
G 33 

+ -4^ 

(R 33 

+ AF 33 - 

(HP*, H^) 33 ) 2 

n 

n J 

n-1 

2 

n 

n n | n n' 



l 1 


n 




So 

e [ (R 33 - 
n 

) 2 ] = 

G 33 

n 

+ (F j V 

n 

(4.4.16) 


In this expression, (P^ ) ^ is due to the bias of the estima- 
tor and is due to possible deviations from this bias. 

The unconditional mean square estimation error follows from 
the above . 


E[ (R 


jj _ 

n 


R jj ) 2 ] = E(G 3j ) + E[(F 33 ) 2 ] 
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Evaluation of E(G^) and E[(F-^) 2 ] is extremely complicated 

n n 

so the details of their evaluation are given in Appendix B. 
Only the results of that evaluation are given here. 

/V A A 

Under the assumptions that R q = R and Q q = Q, then 


E (G^ ) 


+ 4 1< r3J - a n|n )2 + (C n|l ^R C n 1 1 
+ L n 1 1 


( 4 . 4 . 17 ) 


where Z! is a diagonal yxy matrix whose diagonal elements 
R 

are E[(R^ - R-^) 2 ] 

is a diagonal rixri matrix whose diagonal elements 
are E [ (Q^ - ) 2 ] 


C n | 1 is a yxy matrix defined by 


n 


- I "Vn|kV Jl)2 + (I - 2 H nV Jj6 j 


j£ 


k=l 


A I, is a (3x3 matrix defined by 
n k 


x "i k = iiL (i ■ a i H i’ * (i ’ i - i> ' = 1 


L n |^ is a yxn matrix defined by 


n 


(L M ) j£ = Y ( (H A |,D,r,) j£ ) 
n 1 Z_. nnkkk 


k=l 


D k = 1 " A k H k 
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= $ (n / n-l)P n _ 1 i n _ 1 $ (n,n-l) 


II 


Then 


e(G^ j ) = Q 33 + 


(4.4.18) 


hi:!.:. : S'-. . !: ...I : . “ .....; .. ' j : ;:J ' iillHLr ' 

Tf R = R and Q = Q, it can be seen that AM, ~ ® foxr all . 

o o *■ 

; .!■ . ; ■: ; ;;; ; ; . " : ;;;;; ' 

In this case Q p is an unbiased estimator for Q. If R q ^ R 
or Q q / Q, then Q p is biased, the bias equalling M p . 

The unconditional expected value of Q p follows from 
the above o . ■■ ; 


e(q^ 3 ) - e <q-^ ) + i|||: 3 ::r 


E(Q jj ) = Q 33 


E(M l j) - 1 I E<iM k :I) 


If R = R and Q = Q, the E ( AM, ) = 0 for all k and 

O C K 

E(Qj j ) = Q jj 

■ ■ ■ ss 

If R R or Q 7 n Q, Q p xs biased across tne ensemble of all 
possible R and Q, the bias equalling E (M p ) . 

As with the R estimator, the measures of estimator 
error are e [ (oj j - Q jj ) 2 ] and E[(oj j - Q 33 ) 2 ] . 


ffoii - o 33 ) 2 ! = ef(o jj - 8 (oZ j )) 2 3 + [£(QP - Q jj )] 2 


e [ (Q-^ - eCQ^" 1 )) 2 ] can be computed recursively by noting that 
n n 


e[(Qp - e (S3 j ) ) 2 ] = £[(2 j ) 2 ] - le(Qp)] 2 


n 


“n 


n 


n 


The diagonal elements of (4.4.13) are squared and the condi- 
tional expected value then evaluated. Use is made of the 
approximation 


et(Q lj n (r" 1 4x*A X * T r T_1 )«] - e (S jj ,) e [ (r' 1 Ax‘Ax* T r T " 1 ) ^ ^ ] 

n-1 n n n n ~n-l n n n n 


Using this approximation and after extensive algebraic 
manipulation 


c[(8j j - e(qP)) 2 ] = 


~n 


n 


n 


(4.4.20) 


where 


= 


n 


n-1 


n 


+ -%[((Q + am - T*) jj ) 2 ] 
n— 1 __z n n 


n 


* -j 

t « r (p i - u )r 

n n n n n n 


So 


t<2j j - Q jj ) 2 : = j” + (MjV 


(4.4.21) 


In this expression, M - 1 - 1 is due to the bias of the estimator 


n 


and is due to possible deviations from this bias. The 


unconditional mean square Q estimation error follows from 
the above. 


E[(Q jj - Q jj ) 2 ] = E(J jj ) + E[(M jj ) 2 ] 


n 


n 


n 
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As before, evaluation of E(J^ J ) and E[(M^ J ) ] is complic 

n n 

so only the results of such evaluations are given here. 

•• : .. ' ' ' js -a. j ; / ' L/iiii:: 

Under the assumptions that R q = R and Q q = Q, then 


DD \ = 


E(J J J ) 


' E < J n-l> + 4f (( Q - V lj > 2 + <u n|l ^ 

n 1 


+ w | , 

n 1 1 Q n| 1 


where 


U % 1 is a nxy matrix defined by 


• . \ D & 


W n\l ) 


Y ((g jT D 1 X .A*)*) 2 - ( (g-^ T D 1 A*) £ ) 2 i 
Z_ ^n n n|k k a n n n 


gl = H T f 2 
r:n-. n n 


f J is defined as the j column of a matrix f 
n 


^-1 m 

f = r x a 
n n n 


(m ^) 1 - ((f^) £ ) 2 the square of the 2 th 


of the vector f-* 
n 


W n i 1 XS a 11X11 matr ^ x defined by 





r 

Also from Appendix C , 


E[(M^ j ) 2 ] = 


£^] 2 E[(M«i) 2 ] + 4[(2 n U^U E R D ATl • U A|l E R U ;Il )jj 


where 


+ 12 n w A|i W A|1 - w A|i w Au )jJl (4 - 4 - 23 > 
°A|i 4 k t “An 


k=l 

n 


w A|i 4 s Z "An 


k=l 


When feedback is used, the estimates of R and Q are used 
to compute filter gains, so that these filter gains become 
random variables under both the conditional and unconditional 
expectation operators. Evaluation of the expected values of 
nonlinear functions of the R and Q estimates becomes impracti- 
cal unless approximations are made. The nature of these 
approximations is e[f(x)] - f(e(x)), where f (x) is a nonlinear 
function of a random variable x. As before 


^jj _ rv^l R jj + i(Az ' *Az ' * T + H P* , H x ) 
n n n-1 n n n n n n n 


Tv j j 


* * 

However, now Az^ and p n | n are computed using the past esti- 
mates of R and Q. The conditional expected value of R^ is then 


e(R^) = 2-i-£(R^-) + -(e (Az , *Az'* T ) + H £ (P * | )H T ) jj 
n n n-1 n n n n n n n 


A 


3 I is a function of R .,..., R ,Q Q , so that it now 

n n n-1 o n-1 o 
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•k k 

becomes a random variable and £ ( p n | n ) ^ ^ n | n as was true in 
the case of the no feedback case. 


* *T 

s(iz A iz n > 


rp ip T ,a»* T> , r 'j*T . T T 

e(v n v n> + V (x n|n x n|n )H n ‘ H n e (x n I n v n> ‘ e (v n x n | n> H n 


where 


e (v v ) = R as before 
n n 


a,* t . A f v 

£ ln X n Ini = 8 ( P n In' 

n n n n n n 


o>* T * T, 

e (x i v ) = e (A v v ) 
n n n n n n 


In the feedback case, A n is a random variable so it cannot 
be taken outside the expectation operator. 


* * t a -1 

A = P I HR , 
n n n n n-1 


n n 


* * * t * T 

(I— A H )P | , (I -A H ) + A R , A 

v n n' n n-1 v n n n n-1 n 


P nln-1 = * T (n,n-l) + 


v is independent of A n as past values of R and Q are used 
* 

to compute A n « So 


e (A*v vh = e (A*) e (v vh = e (A ) R 
n n n n n n n 


SO e (A^*A Z ' * T ) = R + V(P n|n )H P - V(A*> R - 


156 



Define AF r = + H^P*, )hJJ - (A n ) R - Re(A n T )H^ 


Then 


£( ^n J) * ^ + l (E + AF n )3j 


(4.4.24) 


= r33 + P 33 


where 


- = i y 

n n / . 


Approximations must be used to evaluate e( p n i n )' £ ^ p n |n^ / 


and e (A ) 
n 


* * T * *■£ 

P | = (I-A H )P | , (I-A H + A RA 

n n n n n n-1 n n n n 


S (P n , n ) - E K I - a n H n )P n|n-l (I - A n H n )Tl + e 'VO 


The following approximations are used. 


e[<I-A*Hn> P nln-l (I - A n H n )Tl » e<I-A*H n > £ < P n|n-l> e[ (I-A*H n > T l 


* * rp * * r P 

£(A n RA n 1 ) = e(A n ) Re(A n ) 


£(A*) * £ (P* | ) H *[ £ (R -I ) 1 1 

n n n n n-x 


Using these approximations, e (P i n ) can be evaluated recursively « 


UP,) = (I - e<A*)H n > " =«>/ + e<<>*e<C> 


(4.4.25) 


L. 



with e ^ P n|n-1> = e( Vl|n-l> 4 ' + r n Qr n 

(4.4.26) 

* 

Using the same approximations, £ ( p n | n ) can computed 
recursively. 


e < p n|n> “ £(I "¥n ) e(P n|n-l> e(I " A n H n )T+ £ < V £ ( Vl> £ O 

(4.4.27) 

where = $ (n,n-l) e (P*_ 1 j n _ ± ) $ T (n,n-l) + ^£(0^)1^ 

(4.4.28) 


The unconditional expected value of R n follows from the 


above. 


n 


E(R^) = E(R^) + E(F^) = R 1 ^ + - ) E(AF^) (4.4.29) 

n n n l > -k. 


k=l 


where E(AF U ) = H k (E(P k | k ) + E (P* | fc ) ) H k - H k E ( e (A* ) R) 


*T T 
- E(Re(A k ± ))H k 


Additional approximations must be made to evaluate E (P k j k ) 


E(P k | k ), and E(e(A k )R), namely 


E(P k . k ) « E(I-A*H k )E(P k | k _ 1 )E(I-A k H k ) T + E(A*)R E (A* T ) (4.4.30) 

where E(P k | k _ 1 ) = $ (k ,k-l) E (P^ 1 k _ ± ) $ T (k ,k-l) + r k Qf k (4.4.31) 


Similarly , 

E(P*| k ) . E(I-A*H k )E(p; |k . 1 )E(I-A*H k ) T + E (A*) E 1^) E (A* T ) 


(4.4.32) 
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where E(P k | k _ 1 ) = $ (k ,k-l) E (P k-1 1 k-1 ) $ (k,k-l) + r^E (Q^j^) r J 

( 4 . 4 . 33 ) 


and 


E( V ” E(p kik> H k 


In a similar fashion, the conditional and unconditional 
expected value of the estimate of Q can be obtained. 


e(Q^ D ) = e(Qj 3 .) + i (Q + AM ) 

n n n-1 n n 


33 


( 4 . 4 . 34 ) 


where after algebraic manipulation, AM n can be expressed in 
the following form 


AM - Q - Q + f [ (R - R) + H (e(P i ,) 
n * n n n n-1 


* rp rn 

E(P n|n-l>>V f n 

( 4 . 4 . 35 ) 


where here 


-1 * 
f = F e (A ) 
n n n 


Similarly, 


E (Q-* ^ ) 
n 


^ E(Q n-l ) + E (Q + E ( AM n )) jj 


where E(AM n ) is evaluated approximately by using unconditional 
expected values instead of conditional expected values in 
( 4 . 4 . 35 ) . 


E(AM n> * f nV E(P „|n-l> ‘ E(P I|n-l> > H n f n 


where here 


f = r 1 E(A*) 
n n n 
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I 


I III I 


I I 


The mean squared estimation error of the R and Q 
estimators can be found approximately by using the results 
of the no feedback estimators replacing P n | n by £ ( p n | n ) or 

jc -k 

E(P n|n )/ P n|n by e(P njn ) or E(P n|n )f etc " the conditional 
or unconditional expected values being used depending upon 
whether the conditional or unconditional mean squared error 
is being evaluated. 

Once estimates of R and Q have been found by the above 
procedures, some way must be found for incorporating the 
a priori estimates of R and Q into a combined estimate of 
these quantities. Presumably, along with the a priori esti- 
mates of R and Q there is available some measure of the quality 
of these estimates, say the variances of the estimates. 
Expressions have been developed for evaluating the quality 
(mean square error) of the estimates based upon measurements 
alone. Then a reasonable, but not necessarily optimal, tech- 
nique for combining these two estimates would be in some 
inverse variance fashion. In the case of the R estimate, 


/V * 

R = 
n 


/t-> l 

~ 1 \ 

E . 

1 

R / 

\ R / 

n 

L n 


-1 


R + 
n 


R 


-1 


R 


where R is the combined estimate, R is the estimate based 
n n 

✓N 

upon the measurements alone, and R q is the a priori estimate. 
£ is the unconditional mean square R estimation error 

Rq a ■ • 

matrix E[(R -R) ? 3 ) ] 6 . . and L is the variance of the 

n £ jk _ 

K o 

a priori estimate of R, and 
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-1 

-i 

-i 

* = 

E + 

L 

R 

R 

R 

n 

n 

O 


If E(R ) = R and R = R, then 
n o' 


z : = Eim*” - R Jj ) 2 ]6. . 

i- * n i] 

n 

Similar expressions can be developed for a combined Q estimate. 


4 . 5 Review of Procedures Suggested by Others 

Several authors have studied the problem of optimal 
filtering when the parameters describing the statistics of 
the measurement and driving noises are not accurately known. 

A brief summary of the results of their work is included 
in this chapter. As will be seen, the estimators are simple 
to use but are suboptimal, that is, no optimality condition 
is satisfied by the solution. In many applications the 
resulting estimators may be biased or may not actually exist. 
However, in some applications, such estimators may provide 
useful solutions to the problem. 

A technique of estimating R and Q developed by Shellenbarger 
(Ref. 31) utilizes the theory of maximum likelihood estimation 
outlined in Chapter 2, but applies the theory only to obtain 
an approximate solution. His technique is based upon estima- 
ting the parameters of R and Q using the information obtained 
at one measurement time and then performing some average over 
current and past estimates to obtain a combined estimate. If 
there is insufficient information available at each measurement 
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time to estimate all of the unknown elements of R and Q, then 
his technique cannot be used. Unfortunately, many interesting 
applications of optimal filtering have a small dimension 
measurement compared to the dimension of the state so that 
there may be little information at each measurement time upon 
which to base estimates of R and Q. However, if it is assumed 
that the driving noise statistics are precisely known a priori, 
then there is always sufficient information in the measure- 
ments to estimate the statistics of the measurement noise. 

Given n measurements z^,..,z n , and the conditonal maxi- 
mum likelihood estimate of the state prior to each measure- 
ment (conditioned upon the assumed values of R and Q) , the 
joint probability density function of the n measurements can 
be written as 


f (z 1 , • • ,z n ) f (z n | z^_ ^,»«,z^)f( ^ ^ | z 2 / •• / z ^ ) . . • f ( z ^ ) (4.5.1) 


where 


z k - H k x k + v k 


Given that all the assumptions used in deriving the maximum 
likelihood state estimator of Chapter 2 are valid, then 


f (z kl Z k-1' ’ * ,z l ) “ 


-? (Az k B k lAz k> 


(2tt) y/2 B,, 1 1/2 


where 


Az k z k " H k x k I k-1 


B. = R. + H, P, I . , H. 

k k k k k-1 k 
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Therefore 


n 1 
f(z,,..,z ) - TT y /2 i 

1 n k=l (2 tt) Y/ B. 


1,. T_— 1 . v 

“2 (4z k B k 4Z k> 


T72 


For estimation of R n with Q 1 , 0 ..,Q n known, Shellenbarger 
suggests maximizing f(z^,..,z n ) with respect to R n and solving 

A 

for R . However, he realizes that the solution depends upon 
n 

the unknown R i'**' R n _i* To solve for all R^, f(z 1 ,..,z n > 

would have to be maximized with respect to R^(l < i < n) and 

the resulting equations solved for R^. Shellenbarger dismisses 

this approach as being infeasible for any nontrivial system. 

Rather than simultaneously estimating all R^, he suggests 

that the single measurement conditional likelihood function 

f(z |z z..) be maximized with respect to R , using past 

n n-1 1 n 

estimates of R^, . . / R n-1 to compute the necessary quantities 
appearing in this likelihood function. 


8 In f(z 


n 


Z n-1 ' * ’ ,Z 1 } 


9R 


n 




Az 

n 




(4.5.2) 


(4.5.2) is set to zero and the resulting equation solved for 

R . x i . and P | . are not precisely known as they are 

n n|n-l n|n-l c 

defined as the maximum likelihood estimate of the state and 

its covariance conditioned upon the true values of R ^, . . . / R n _]_ • 
For this solution they must be evaluated using some average 
of the estimates of the past R^. The results of such a 
procedure are 


-1 

n 



* *!TI *— 1 

Az Az )B = 0 
n n n 


(4.5.3) 
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where 


* A * T 

B = R 4- H P | n H 
n n n n n-1 n 


* 

z 

n 


A 


z 


n 


* 

H x i «» 
n n|n-l 


x i , and P | , are the values of x„ ■ , and Pi„ , 

n|n-l n|n-l n|n-l n|n-l 

evaluated recursively using some average of past estimates 
of R at each updating time. Then the estimate of R n is 
defined by 


A 



* *rn * rn 

Az Az - H P , , H A 

n n n n n-1 n 


A 

The conditional expected value of R n is 


(4.5.4) 


^ n, * a,* *T» * rn 

e(R)=e[(v - Hxi ,)(v - Hxi , ) ] - H e(Pi , ) H 

v n n n n n-1 n n n n-1 n n n-1' n 


r\.ic n, *rn * rn 

R + H (e(x I ,x | ,) - e (P | i)) H n 

n n n n-1 n n— 1 n n—± n 


r\,Jc f\, -krp 

e (x i x i . ) represents the conditional covariance of the 

state estimation error, conditioned upon the true values of 

R and Q and the fact that estimates of the past values of the 

measurement noise covariance matrices have been used in com- 

* 

puting filter gains. £ ^ n |n-l^ re P resen ts the average (over 
the ensemble of all measurement and driving noises) computed 
state error covariance matrix. As was shown in the previous 
section, when past values of estimates of R or Q are used to 
compute filter gains, evaluation of these two quantities is 
exceedingly difficult and in general cannot be performed 
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without approximations. Shellenbarger states without proof 
that 


i\,-k * 

e<x n|n-l x n|n-l> " e(P n|n-l ) 


and thus concludes that 


(4.5.5) 


e (R ) 
n' 


R 


n 


(4.5.6) 


This demonstration of unbiasedness depends upon the validity 
of (4.5.5), something that Shellenbarger does not adequately 
discuss . 

The case of estimating £> n with R^,..,R n known is consid- 
erably more complicated that the previous case. The solution 
depends upon the rank of the measurement matrix H n> The 
forcing function matrix T n is presumed to be the identity 
matrix. The single measurement conditional likelihood func- 
tion is maximized with respect to Q , using past estimates of 
Qf, . . ,Q ^ to compute the necessary quantities appearing in 
this likelihood function. 


91n f ( 2 n l z n _i' * • ' z i) T -1 T -1 T 

- - -_ n — =— = H B X (B - Az Az 1 ) B X H X 

9Q n n n n n n n 


(4.5.7) 


n 


where B = R + H ($(n,n-l)P .. , $ T (n,n-l) + Q ) 

n n n n-l|n-l n n 


(4.5.7) is set to zero and the resulting equation solved for 


Q . As in the estimation of R , x i . and P , i . are 
n n n|n-l n-l|n-l 

evaluated using some average of past estimates of Q_^. 
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0 


(4.5.8) 


T 1 *— 1 T *. *-l 

H n B n (H n Q n H n " C n )B n H n " 


where C = Az Az T - R - H $(n,n-l)P ,i ■, $ T (n,n-l)H^ 
n n n n n n-±|n-l n 


If H n is square and possesses an inverse, then 


~ A -1 * t-1 

Q = H C H 
n n n n 


(4.5.9) 


The conditional expected value of Q is then 

n 


e(Q n ) - Q n + »(n,n-l)[e(x*. 1 | n .ix*3 1 , n . 1 ) - e (P*_ 1 , t T (n,n-l) 


Again, Shellenbarger assumes that 


(4.5.10) 


f\jic * 

e(x n-l|n-l x n-l|n-l ) = e(P n-l|n-l ) 


(4.5.11) 


and thus concludes that 


e (Q ) = Q 
n n 


(4.5.12) 


The same comments apply here as before concerning the validity 
of (4.5.11). 

—1 T —1 —1 

If H does not exist, but (H Y H ) exists, where 
n n n n ' 


Y = R + H $(n,n-l)P . i , $ T (n,n-l)H T 

n n n ' n-1 n-1 ' n 


then a solution for Q n can be obtained from (4.5.8) by 
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*-l 

finding B n using the matrix inversion lemma and carrying 
out some matrix manipulation. 


~ A T -1 -1 T -1 * -1 T -1 -1 

Q = (H Y X H ) H Y C Y H (H Y H ) (4.5.13) 

n nnn nnnnn nnn ' 

A 

The conditional expected value of this estimate of Q n is equal 
to (4.5.10). 

-1 T —1 —1 

If neither H nor (H Y H ) exists, then a unique 
n n n n 

solution of (4.5.7) does not exist. However, by use of the 
generalized inverse of H n a particular solution can be defined 
which satisfies (4.5.8). 

Q_ = H* C H“; ff (4.5.14) 

n nnn 

# T -1 

where H is the generalized inverse of H . If (H H ) exists, 
n nnn 

then 

H* = H^(H H^)" 1 (4.5.15) 

n nnn 


The conditional expected value of (4.5.14) using (4.5.15) is 


^ rn m _ 1 T T 1 — 1 

e (Q ) = H x (H H ) H Q H (H H 1 ) X H 
n n n n n n nnn n 


(4.5.16) 


rp rp _1 A,* V*T 

+ H (H H ) H *(n,n-l)[e(x .i .x \ , .) 

nnn n ' n-1 n-1 n-1 n-1 


- ^n-lln-l’ 1 * T<n ' n - 1)H n <H n H n»‘ lH n 


As with the explicit subopt imal estimator, the real 


167 


difficulty in the use of Shellenbarger ' s method is when the 

elements of R n and Q n are to be estimated simultaneously. 

(4.5.3) and (4.5.8) must be solved for R n and Q . However, 

there is no possibility that these equations can be solved 

uniquely for these two quantities since the equations are not 

independent. In essence, Shellenbarger suggests that the 

number of unknown elements of R and Q be reduced until the 

n n 

number of unknowns is equal to the number of independent 

equations. If R^ is assumed to be diagonal, the number of 

unknown elements in R n is reduced from y(y+l)/2 to y„ 

However, a solution for these diagonal elements and is 

possible only when there are redundant measurements, or 
T —I 

(H H ) exists. In such a case, 
n n 

c (R ) = (I-N *N f 1 c (Az*Az* T - N Az*Az* T N ) (4.5.17) 

v n n n nn nnnn 

where c ( ) represents a column vector whose elements are the 
diagonal elements of the matrix argument and 

N = H (H T H ) _1 H T 
n n n n n 

N *N is a matrix whose elements are the squares of the 
n n 

corresponding elements of N . 

/\ A 

Once the diagonal elements 6f R are obtained, Q can 
3 n n 

be obtained from (4.5.13) using estimates of R in place of 

n 

the unknown R . Clearly this technique has applicability in 
only those cases when redundant measurements are taken at 
each measurement time. In most applications the dimension of 
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the state is large compared with that of the measurement, in 

which case the unknown elements of R and Q cannot be simul- 

n n 

taneously estimated by Shellenbarger 1 s method. 

Dennis (Ref. 5) uses an entirely different technique 
for obtaining estimators for R and Q, but as in the case 
of Shellenbarger ' s method, he essentially relies upon having 
sufficient information in each measurement to define esti- 
mates of R and Q based upon a single residual. If sufficient 
information is not available, or if some components of the 
driving noise are not observable from one residual alone, 
then Dennis suggests lagging the driving noise variance esti- 
mation with respect to the measurement noise variance 
estimation, that is, use past as well as current residuals 
to obtain some estimate of the driving noise covariance. 

Dennis obtains a functional relationship between certain 
residuals and the measurement and driving noises. From this 
relationship he postulates the form of the estimators. No 
criterion of optimality is used, and his proof of unbiased- 
ness and stability of the resulting estimation loop is 
questionable . 

At each measurement time, the existence of a minimum 
variance or maximum likelihood state estimator is presumed, 
with estimates of R and Q used to compute the proper residual 
weighting matrices. From the recursive state estimate updat- 
ing equation (2.3.38), 



^ * * A * 

$(n,n-l)x | . + A (z -H $(n,n-l)x . i .) 

' n-1 n-1 n n n ' n-1 n-1 
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The assumed models for the state and measurement are 


x = $ (n ,n-l) x , + r w 
n n-1 n n 


z = H x + v 
n n n n 


Then the following expression can be obtained for the esti- 
mation error. 


f\j* A * 

X | = X I 

n n n n 


Oj* 


- x = (I-A H ) $(n,n-l)x .1 , 

n n n n-i. n- ± 


"k k 

+ A v +(-AH - I ) T w 

n n n n n n 


Consider the three residuals 


m 

A 




z - 

n 


n 

;m 

A 



= 

z 

n 


n 

s 

A 

^ * 



x i 

n 


n | n 


a y x 1 vector 


a y x 1 vector 


a 3 x 1 vector 


It can be shewn that 


„m 

'n 


m 




(I -Vn )v n + H n (I - A n H n )r n w n + H n (A n H n- IH <n ' n - 1) x n-l | n-1 

( 4 . 5 . 18 ) 


r = 
n 


r\jk 

v + H T w - H $(n,n-l)x . i , 

n n n n n n-1 n-1 


( 4 . 5 . 19 ) 


r = A v + AH T w - AH 4 (n,n-l) x 
n nn nnnn nn 




n-1 n-1 


( 4 . 5 . 20 ) 
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These equations are singular in the sense that v n and w n 
can never be exactly determined from the residuals alone. 
However, in terms of squared residuals, some nonsingular 
mappings of averages can be obtained. 

IU s 

Begin by considering the two residuals r n and r n in the 
following form. 


m 

r 

n 


I 

h r 

n n 


V 

n 


H $(n,n-l) 
n 

s 

r 

n 


* 

A 

n 

a*h r 

n n n 


w 

n 


A H $(n,n-l) 
n n 


r\j-k 

x n-l I n-1 


( 4 . 5 . 21 ) 


Or 




r = K <p - C x i 

n n Y n n n-1 n-1 


( 4 . 5 . 22 ) 


where 


— T , mT sT x 
r = (r , r ) 
n n n 


,T / T T, 
d> = (v , w ) 
n' n 


K = 
n 


h r 

n n 


* * 

A A H r 
n n n n 


C n " 


H n $ (n,n-l) 


A H $ (n,n-l) 
n n ' 


th 

Consider the i element of r . 

n 


r i = V K ij A j _ r ij £*j , 

r n l—i K n ^n n x n-l|n-l 
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(4.5.23) 


Squaring r 


i 

n 


leads 


to 


(r 1 ) 2 
n 


■ i 


i j \ 2 


(*C J ) 


n 


<^> 2 


n 


Y. Y 

j 


K lj K ik ^* k 
n n Y n T n 


- 2 


Y I 


K i WS£ k ll . + 

n n v n n-l|n-l 


Y Y 


„xj ri ik , v*k ^*1 

C J C x ,i ,x J n i , 
n n n-1 n-1 n-1 n-1 


Assuming that v n and w n are mutually independent zero mean 
normal random vectors with time independent statistics, then 
the only terms of interest in (4.5.23) are the first and last 
because the average values of the others are zero. Therefore 


(r j )2 = y (K i j) 

n L » n 


2 (<j> j ) 2 + 
Y n 




r ,ii^,ik r ^*k ^*1 

C J C x .1 ,x J , | . + 

n n n-1 n-1 n-1 n-1 


a 


n 


(4.5.24) 


where a 1 is the sum of all other terms in (4.5.23) and is by 
n 

definition zero mean. 

Next Dennis assumes that (<f>^) 2 is a Rayleigh variable 

having mean or as appropriate, where R^ and are 

vectors of the diagonal elements of R and Q . Thus 

n n 


/ 2 



r n 

il 

W<? 

R C 

n n 

<r®) 2 

n 

Q + £ 

n 


n 


+ S + £ 
n n 


(4.5.25) 


where & n is a matrix composed of the squared elements of K , 

C and £ are zero mean random vectors, 
n n 
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s = 

n 


11%* v*T 1 

C n x n-l | n-l x n-l | n-l C n 


Y+STr j*t c y+s 

n n-l|n-l n-l|n-l n 


-i -hh T 

where C J is the j column of C . 

n n 


T , 1 Y+$x 

°n " a n' * * /a n } 


a 2 s 2 

(r ) and (r) z denote vectors whose elements are the squares 
n n 

^in s 

of r™ and r^ respectively. 

(4.5.25) is central to Dennis's development of noise 

variance estimation. In particular, it can be seen that if 

& is of full rank then 
n 


R 

+ 

c 


/ m \ 2 
(r ) 

n 



= r 1 

n 

(L 

+ 

e 

n 

(r n )2 

__n 


n_ 


n 


- &“ 1 S - ft 1 a 

n n n n 


Or 


R 

n 

%-i 

, m. 2 
(r ) 
n 


&-1 



= K 

n 

/ s \ 2 
(r ) 
n 

- K S 
n n 

- K a - 
n n 

}n_ 


(4.5.26) 


From an examination of (4.5.26) Dennis postulates the 

form of the estimator for R and Q based upon one set of 

Xi n 

residuals . 
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substituted for 


/\ 



R 

n 

4 r i 

/s 

n 





<-l 

n 


- 2 L~ S 


where S is the matrix with , i , 
n n n - _l | ri“x 

,'V* v*T . 

X n-1 | n-l X n-l | n-1 * 

The above estimates are not those used for the computa- 
tion of the filter gain matrices but rather some average of 
past estimates. This allows for some "smoothing" of the 
single residual set estimates. 


n 

R 

03 . 

3 

/s 

V 

n 

R T 

0). = I 

zL i 

4 

j=l 



* j=l 


n 

z 

3 

✓V. 

V 

n 

i 

0)9 = i 

3 

j=i 



j=i 



§* 4 


where o)^ and o )9 are weighting factors that can be arbitrarily 
chosen. 

The conditional expected value of such an estimate is 

most difficult to obtain since the matrix K n is a random 

function of the previous estimates of R and Q. Dennis does 

show that for scalar measurements and no driving noise the 
— * 

estimate R n is to first order independent of variations in 
the value of R used to compute the gain matrix K n . He states 
that this is true whether or not driving noise is present but 
does not show that the estimate R n is independent of varia- 
tions in the value of Q used to compute K n « He also states 
that the estimate Q* is independent of variations in Q used 
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to compute K n for a scalar state variable. What all these 
statements mean with respect to the biasedness of the estimates 
in realistic situations is not clear. 

If the matrix K n is singular, then a slightly different 
procedure must be used. It is assumed that R and Q are 
constant or slowly varying in time. A time average of (4.5.25) 
is taken prior to inversion. 



R . + ? . 
3 3 


Q . + C • 
3 3 


+ ft . S . + ft . ol . 

3 3 3 3 


n 


* I 


„ 'v* 

ft .K . 
3 3 


3=m 


* 

+ ft .S . 
3 3 


where ft ^ are arbitrary weighting factors with 


I n j ■ 1 

j=m 


n 


I 

j=m 

R 

ft . K . I exists , 

3 3-1 

i 

n 

r n 

, m. 2 
(r . 


✓N 

= V ft . K . -1 

L> 3 3 

I 

3 

/ s \ 2 
n J 

* 

- ft .s . 

3 3 

Q 

j=m 

L j=m 

- 






Dennis attempts to show that for some n, the weighted 
Slj matrix is nonsingular. However, using his own analysis, 
if the measurement matrix is time invariant, the weighted 
matrix is always singular if itself is, thus limiting the 
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applicability of the solution to cases when this is not true. 

Smith (Ref. 33) has studied the problem of real time 
estimation of the state and the measurement noise covariance 
but only obtains a suboptimal solution of the problem. In 
his dynamical model of the state, there may be noise driving 
the state but it is assumed that the statistics of this noise 
are precisely known. 

The state obeys the recursive relationship 

x = 4>(n,n-l) x . + w (4.5.27) 

n ' n-1 n 

The measurements of the system state have the usual form 

z=Hx+v (4.5.28) 

n n n n 

where z is a v x 1 vector. This single vector measurement 
is equivalent to y scalar measurements when the measurements 
are independent, or equivalently, when the measurement noise 
covariance matrix is diagonal. In this case, the scalar 

measurement at time n is given by 

z^ = h^ T x + v^ (4.5.29) 

n n n n 

here h-^ is the j ^ column of the matrix H T . 
n J n 

It is assumed that the initial value of the state is 
normally distributed and that w n is also normal. The distri- 
bution of each v- 5 is also normal with zero mean and variance 
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can be represented as 


R- 1 - 1 . It is further assumed that R- , - ) 
n n 



R^ 

nom. 


(4.5.30) 


where is a time invariant but unknown precision factor 

associated with each component of the measurement. Smith 

assumes that each k-' has an inverted Gamma distribution which 

describes the a priori uncertainty in the value of k-* . The 

form (4.5.30) is used for R so that deterministic time- 

n 

varying characteristics of R n can be easily modeled. The 
probability density function of each k-* can be represented as 


f(k) = C 


a , i 

a b 1 2 + 1 


1^ a b 
'2 


k > 0 


= 0 


k < 0 


(4.5.31) 


where 


C 


(f + 1) 

2 2 


1 

r (| + i) b 


and a and b are parameters of the distribution of k. The 
mean of this distribution is proportional to b. 


E (k) =J k f (k) dk = b (4.5.32) 

0 


The joint conditional probability density function of 
the state x n and the parameter k is given by 


f(x ,k Z ) = 
n' n 


f(x n' k l Z n-l> £(z n |x n' k ' Z n-l l 


f(z nl Z n-l ) 


(4.5.33) 
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where Z n represents the vector of n measurements, Z n _^ 
represents the vector of n-1 measurements, and 

00 

f (z | Z ,) = / f (z | Z , ,k) f(k) dk (4.5.34) 

n 1 n-l j n n-l 

0 


The conditional probability density function of the 
measurement z given k, x n , and Z n-1 is 


f (z x , k , Z , ) = 
n n' ' n-l 


-i(z -H x ) T R 1 ( z -H x ) 
2 n n n n n n n 


( 2tt ) 


Y/2 


R 


T72 


n 1 


(4.5.35) 


Since the components of the vector measurement error v n 
are independent. Smith considers z n as a scalar since a vector 
z n can be thought of as a sequence of scalar measurements as 
mentioned before. Thus all subsequent expressions involving 
the measurement z n can be thought of as expressions involving 
a component of a vector measurement. For notational conveni- 
ence the superscript j denoting the component is dropped. 

Then with R = k R , 
n nom 

n 


f (z x ,k, Z , ) 
n n' ' n-l 


.- 1/2 


1 T 2 
— i-(z -h x ) / k R 

2 n n n nom 


( 2 ttR ) 

nom 

n 


17 2 


n (4.5.36) 


By Bayes ' rule 


f (x , k I Z , ) 
n' 1 n-l 


f (x I k , Z , ) f (k 
n 1 n-l 


Z , ) 
n-l 


(4.5.37) 
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In Chapter 2 it was shown that 


f (x | k, Z . ) = 
n-l 


n 


(2tt) 6/2 |P 


n n-1 


1/2 


1 ^ T — 1 ^ 

-£(X -X I , ) P i , (x -X i ,) 
} 2 n n|n-l' n|n-l n n|n-l 

(4.5.38) 


where x i , is the maximum likelihood estimate of x before 
n|n-l n 

X. "U 

the n 1 " measurement. Pi , is the conditional covariance of 

n | n-± 

✓N 

x about its conditional mean x i , , and 6 is the dimension 
n n | n-l 

of the state x . Both the state estimate and its conditional 
n 

covariance are functions of the unknown k. 


It will now be shown that f (x ,k|z ) has a particular 

n n 

form and that this form is preserved after repeated measure- 


ments . 


It is assumed that the distribution of the initial state 


x is a normal distribution with mean x i and covariance P,i . 
o o | o o | o 


f (x o ) 


( 2tt ) 


e/2 


T7? 


1 ~ T -1 A 

“4 (x -x I ) 1 P | (x -X I ) 

2 o oo oo o oo 


(4.5.39) 


o o 


Initially, x q is independent of the parameter k so the joint 
probability density function of x q and k is 


f(x Q ,k) = f(x Q ) f(k) (4.5.40) 

The joint probability density function of the state and 
the parameter k immediately before the first measurement is 
given by 

f(x 1 ,k) = f(x 1 |k) f (k) (4.5.41) 
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It is easy to show that 


ffx-Jk) = 


(2Tr) e/2 |P 


1/2 


1 ~ T -1 

I ) P, | (X-.-X-, | ) 

e 2 1 1 l° 1 l° 1 1 l° (4.5.42) 


1 o' 


where 


/\ 


L 1 o 


$d, 0 ) ; 0|0 


p 


l|o 


$(1,0) P o , $ T ( 1 , 0 ) + Q 1 


where is the covariance of the driving noise w^. Then 
using (4.5.31) and (4.5.42), (4.5.41) becomes 

- (§• + i) -"^Tnr^ + lr.) Tp i} ^ x i -x i I )] 

f(x ] _,k) = C 1 k 2 e 2 k 1 1 l° 1 l° 1 1 l° * (4.5.43) 


where 


C_ 

( 2 IT ) 


(a b) 


2 . + 

2 


3/2 


1 o' 


1 

1/2 


Because of the form of (4.5.43), f(x^,k) is termed a normal 
inverted gamma probability density function. Then by (4.5.33) 


f (x x ,k | Z x ) 


(a+3 ) 

k z 



1 

k R 

nom^ 


/ v.T . 2 

(z 1 -h 1 x 1 ) 


(4.5.44) 


+ nr + (x i- J iio )Tp lla (x i- ; iio>] 


where 


C 2 = 


(2ttR ) 1/2 f (z, ) 
nom 1 1' 


is the normalizing coefficient. 

After extensive manipulation, (4.5.44) can be written in 

✓v 

terms of new parameters x^i^, a', and b'. 
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f (x 1 ,k Z 1 ) 


C 2 k 


-<4 + x » e "l[ S TT“ + <x i"*i|i )Tp ifi (x i"*i|i > ] 

(4.5.45) 


where 


x l|l * x l|o + A l (z l - h l x l|o> 


(4.5.46) 


P. I , = P, I - A, (k R + hTp. | h. )A ^ 

1 1 1 1 1 o 1 noxn 1 1 1 1 o 1 1 


(4.5.47) 


A. — P. | h,/(hrp. . h, + k R ) 
1 l|o r ' 1 l|o 1 nom 1 


(4.5.48) 


1 

a+1 


a b + 


2 

k(z l - Milo 1 
<h l P l|o h l + k E non. 1 ) J 


(4.5.49) 


a’ = a + 1 


(4.5.50) 


Thus, the joint conditional density function after the first 
measurement also has a normal inverted gamma form. 

Using the same procedures as above, it can be shown that 
f(x n ,k|Z n ) has a normal inverted gamma form for any n. The 
appropriate parameters of the density function can be com- 
puted using recursive relationships of the form shown in 
(4.5.46) - (4.5.50). Each component of the measurement has 

associated with it its own a, b, R , and h , which are 

' ' nom n n 

used in these recursive relationships when that particular 

type of observation is being considered. The resulting a’ 

and b' are not updated again until another observation of the 

same type is considered. On the other hand, x i , and Pi ., 

' n|n-l n|n-l' 

being associated with the state x n , which is common to all 
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observation types , are updated at each and every data proces- 
sing stage. 

Unfortunately, Eqs. (4.5.46) - (4.5.50) cannot be 
computed in a real problem because they involve k, which is 

A 

unknown . Thus in order to compute x i , P i , and b ' , an 

c n I n' n I n' 

estimate of k is required. Smith dismisses the question of 
strict optimality and observes that for large a the parameter 
b is almost equal to the mean of the k distribution. An 
estimate of k is then defined to be equal to the parameter b, 
and the following estimation equations are obtained. 


/N * * ITI~* 

X | =X| , + A (£ - hXi .) 

n n n n-1 n n n n n-1 


* * * ~ *m 

P f = P I , - A (h P , -h + k R )A 
n n n n-1 n n n n-1 n nom n 

i i i n 


n 


P*i n ,h n /(hV, ,h + k R ) 

n n-1 n n n n-1 n nom 

1 1 n 


k' 


^ t 

_ ^ n k (z - h x | .) 

a k + IL n . n .l n ~ ^ 

a+1 a+1 „T * 


h P , n h + k R 
n n n-1 n nom 

1 n 


= a + 1 


(4.5.51) 

(4.5.52) 

(4.5.53) 

(4.5.54) 

(4.5.55) 


It can be seen that (4.5.51) and (4.5.52) are just the 

/A 

maximum likelihood filter equations, except that k is used in 
place of the unknown k. The state estimate and its "computed" 
covariance matrix are propagated between measurements using 
the relationships 
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It should be noted that unless k = k, the "computed" covariance 
* 

matrix ? n | n does not accurately represent tne covariance of 
the estimation error. Smith attempts to show that the estima- 
tor for k as given by (4.5.54) is an unbiased estimator but 
makes several unrecognized approximations in evaluating the 
expected value of k'. He first says that the expected value 
of the second term in (4.5.54) is given by 

(4.5.58) 

However, the expected value of a nonlinear function of the 

A ^ A 

random variables k, P i , , z , and x i , is not eoual to 

n|n-l n n|n-l 

the function evaluated at the expected values of these 
respective variables. 

He then states that 


e (k) e [ (z - h x 


LV n n n [n-1 J 

h^ e (P i ) h + e (k) R 
n n n-1 n nom 


e z - h x I -) 2 ] = k R + h T e (P , ,)h 

n n n n-1' J nom n n n-1' n 

1 n 1 


(4.5.59) 


* 

However, this is true only if on every trial p n | n _i ^ s equal 
to the actual covariance of the state estimation error. This 

A 

generally will be true only if k = k at every estimation stage. 
The third approximation involved is in the computation 

fc 

of He obtains this quantity recursively using 
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the following equations. 


*T, 


e(P n|n> * E<P n|n-l> * E ( V (h n e(P n|n-l )h n + e(k)R nom n > E (A n > 


where e (A* ) = e(P n|n-l )h n /(h n e(P n|n-l )h n + E (k) R nom n > 


Here as before. Smith fails to realize that the expected 
value of a nonlinear function of a random variable is not 
equal to the function evaluated at the expected value of the 
random variable. , 

In testing the above theoretical results, Smith only 
simulates the equations for the mean of the estimate of k 
and the mean computed covariance matrix. This is unfortunate 
since many approximations were made in’ their derivation, 
namely the rather dubious use of the expectation operators 
above. So his results are somewhat open to question since 
he did not simulate the actual performance of the estimator 
of the state and the parameter k in a realistic situation. 
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Chapter 5 

TESTING OF STATISTICAL HYPOTHESES 
5 . 1 Introduction 

In Chapters 3 and 4 techniques for estimating the state 
and noise variance parameters were discussed, and the neces- 
sary equations for the solution of the problem derived. As 
was seen, even in the simplest case, considerably more compu- 
tation was needed for estimating the noise variance parameters 
as compared with estimation of the state alone. In those 
applications when estimation of the state is of primary 
importance, estimation of the noise parameters should not be 
undertaken unless there is reason to believe that the a priori 
estimates of these parameters are sufficiently in error to 
seriously affect the state estimation. The purpose of this 
chapter is to develop expressions and criteria which allow a 
decision to be made as to whether observed data are consistent 
with the assumptions about the values of the noise variance 
parameters. If it is concluded that the data are not consis- 
tent, then estimation of the parameters using the techniques 
of the previous chapters should be undertaken. 

Testing of statistical hypotheses is an important part 
of statistical analysis but is perhaps one of the least 
understood and applied techniques in optimal estimation theory. 
Historically this is So because of a iack of a consistent 
theory which is generally applicable to a wide class of 


185 



problems. But even today, long after the tools of hypothesis 
testing have been developed, often little use is made of 
such tools. This can result in major difficulties in applying 
optimal estimation theory to operational situations. 

A statistical hypothesis is usually a statement about 
one or more population distributions, and specifically about 
one or more parameters of such population distributions. It 
is always a statement about the population, not about a 
finite sample taken from the population. 

There are two types of hypotheses which are of interest, 
namely simple and composite hypotheses. Hypotheses that 
completely specify a population distribution are known as 
simple hypotheses. An example of such a hypothesis is: the 

population is normal with mean m Q and standard deviation a , 
where m o and c q are specified values. When the population 
is not determined completely by the hypothesis, the hypothesis 
is known as composite. An example of such a hypothesis is: 
the population is normal with mean m Q . Here the exact popu- 
lation distribution is not specified, since no requirement 
was put on a , the population standard deviation. 

Hypotheses may also be classified by whether they specify 
exact parameter values , or merely a range or interval of such 
values. For example, the hypothesis m = m o is an exact 
hypothesis, although m >_ m o is not exact. 

Whatever procedure may be used for testing a hypothesis, 
that is, deciding on the basis of observed data whether to 
accept or reject the hypothesis, there are two possible errors 
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involved: 1) rejecting the hypothesis when it is true, and 

2) not rejecting the hypothesis when it is false. For any 
given situation, there may exist a family of different tests 
of the same hypothesis all of which give the same probability 
of rejecting the hypothesis when it is true but result in 
different probabilities of accepting the hypothesis when it 
is in fact false. It seems reasonable that the "best" test 
is the one which minimizes the probability of accepting a 
false hypothesis for a given probability of rejecting the 
true hypothesis. 

All tests involve finding a test variable, or sample 
characteristic, which is a function of the observed data. 

One of the first problems to be faced in making a decision 
from the data is that of choosing the relevant and appro- 
priate sample characteristic for the particular purpose. 
Different combinations of the sample data give different 
kinds and amounts of information about the population. 
Reaching a conclusion about some population characteristic 
requires effective use of the right information in the 
sample, and various sample characteristics differ in their 
relevance to different questions about the population. 

Once the sample characteristic has been selected, a 
"critical region" of the test is defined such that if the 
characteristic lies within the critical region the hypo- 
thesis is accepted, and if it lies outside the critical 
region, the hypothesis is rejected. 

Let S be the sample space of outcomes of an experiment 
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and x denote an arbitrary element of S. Let H q be the hypo- 
thesis being tested (called the null hypothesis) , and let w 
denote the critical region. The probability of the first 
kind of error, rejecting H q when it is true, is denoted by 

P (x in (S-w)] H ) = a. (5.1.1) 

' o 

where a is called the level of significance of the test. 

The probability of the second kind of error, accepting a 
false hypothesis, is denoted by 

P (x in w| H) = 3 (H) (5.1.2) 

where H is a particular alternative hypothesis in the class 
of all possible alternative hypotheses. The function 

y (H) = 1 - 3 (H) 

defined over all possible H is called the power function and 
for a particular value of H, is called the power of the test 
of H. The problem of statistical hypothesis testing is that 
of determining a critical region such that for a given level 
of significance, the power of the test is as large as possible. 

The next sections of this chapter are devoted to discus- 
sion of certain sample characteristics and distributions 
upon which subsequent hypothesis tests are based. 


188 



5.2 Sampling Characteristic s and Distributions 

Let x be a random variable with probability density 
function f (x) and consider n independent repetitions of a 
random experiment to which x is attached. Performing the 
series of n repetitions, n observed values of x are obtained, 
denoted x^, . . r x n * Any sample characteristic will be a func- 
tion of the sample values, say g(x^,..,x n > and accordingly 
the probability distribution of this latter variable will be 
called the sampling distribution of the characteristic 
g (x 1 , . . ,x n ) . 

The sample mean is defined by 


n 



i=l 


and the sample variance 


n 


= k z 


(x i - x) 2 


i=l 


( 5 . 2 . 2 ) 


Let the population characterictics of the random variable 
x be 

e (x i ) = .m 


e [ (x^ - m) 2 ] = a 2 


Then the expected value of the sample characteristic x 
is equal to the population characteristic, m. Moreover, the 
variance of x will be small for large values of n. Thus for 
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a sufficiently large value of n, the sample mean x will be 
approximately equal to its expected value m. If m is unknown, 
x can be used as an estimate of m. 

Consider the variance of the sample. 


n 



i=l 


e (s 2 ) 


n-1 

n 


a 


2 


2 

Thus the expected value of the sampling characteristic s is 

2 

not equal to the population characteristic a but is equal to 
2 

((n-l)/n)a . This difference is insignificant for large n; 
but for moderate n, it will be preferable to consider the 
corrected sample variance 


n 


n 2 IV". 

5=T S = H L (x i - 


2 

x) 


i=l 


which has an expected value exactly equal to cr . 

2 

The variance of s is given by the expression 


e [ (s 2 -e (s 2 ) ) 2 ] 


y 4^2 

n 


2(y 4 -2y 2 ) _ y 4 ~3y 2 


n 


n 


(5.2.3) 


where and are the second and fourth central moments of 

the distribution function of x. (Ref. 3, P. 183) 

2 2 

For large n, the variance of s will be small and s 
can be expected to agree approximately with the population 
variance since, as already pointed out, the expected value 
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2 2 
of s is practically equal to a when n is large. 

Thus far, the sample mean and variance and their first 
two moments were studied without reference to the density 
function of the random variable involved. In order to obtain 
more precise results about the properties of sampling distri- 
butions, it will be necessary to introduce further assumptions 
about f (x) . The case of interest is when f (x) is a normal 
density function. 

If x is an observation from a normal distribution with 
. 2 

population mean m and variance a , the probability density 
function of x is 


f (x) 



(5.2.4) 


It has been assumed that the n observations x^ are indepen- 
dent, so 

n 

f (x , . . ,x ) = 77 f (x . ) 

in l 

i=l 

It can be shown that if the n observations x^ are inde- 
pendent normal random variables with population mean m and 
variance a^, then 

1) x is a normal variable with mean m and variance a /n. 

2 

n s 

2) tt— is a central chi square distributed random 

variable with n-1 degrees of freedom. 

_ 2 

3) x and s are independently distributed. 


191 


L 



4) t = /n-I - — is a Students' distributed random 

s 

variable with n-1 degrees of freedom. 

5.3 Confidence Intervals 

An understanding of confidence intervals is necessary 
before testing of simple hypotheses can be undertaken. When 
estimating the value of a parameter by observations on a 
random variable, it is usually desirable to obtain not only 
the estimated parameter value but also a measure of the 
precision of such an estimate. To obtain such a measure of 

A 

an estimate £ of an unknown parameter £ , two positive numbers 
6 and a might be found such that the probability that the true 
value of £ is included between the limits £ ± & is equal to 
1 - a . Or 


P(£ -<$<£<£ + 6) = l- a (5.3.1) 

For a given probability 1 - a, high precision of the estimate 

would be associated with small values of 6. More generally, 

to an unknown parameter £, two functions of the sample values 
* * 

£^ and £ 2 are found such that the probability that the inter- 

ie ic 

val (£^, £ 2 ) includes the true value £ has a given value 
1 - a, or 


P(? l < 5 < C 2 ) = 1 - a 


(5.3.2) 


Such an interval is called the confidence interval for the 
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parameter and • the probability 1 - a is denoted as the confi- 
dence coefficient of the interval. 

5 . 4 Tests on the Mean 

Two situations will be treated here concerning confidence 
intervals and tests on the mean, one in which a is presumed 
known, the other when a is not known. The case when a is 
known is considered first. 

Let the variable x be normally distributed with mean m 
2 

and variance a , where m is unknown and a is known precisely. 
Given n independent observed values x^,..,x n , a confidence 
interval for the mean is sought. 

In Section 5.2 it was stated that the variable 


n 



i=l 


2 

has a normal distribution with mean m and variance a /n. 
Therefore 


t 


A 


/n (x - m) 

a 


(5.4.1) 


is a zero mean unit variance normally distributed variable. 

Let a denote a given fraction and t be the a percent 
value of t found directly from a table of the normal distri- 
bution. By the definition of t , 

P(-t a < t < t a ) = 1 - a (5.4.2) 
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By a simple transformation, (5.4.2) can be written as 

P (x - t — < m < x + t — 2 -) = 1 - a (5.4.3) 

a /H /n 

(5.4.3) is a relation of the tvpe suggested by (5.3.2). 
Accordingly, the interval 

(x - t — , x + t — ) (5.4.4) 

a /n a /E 

is a 1 - a confidence interval for m, the limits of the inter- 
val are confidence limits for m, and the corresponding 
confidence coefficient is 1 - a. 

Thus the confidence interval (5.4.4) provides a rule 
for estimating the parameter m, which is associated with a 
constant risk of error equal to a, where a can be chosen 
arbitrarily. 

Testing the hypothesis that m has some given value, say 

m , is related to the confidence interval deduced above. In 
o' 

this case a decision is made concerning which of the following 
hypotheses is true, based upon the observed data: 

2 

1) H : x is normal with mean m = m and variance a 

0 o 

(known) 

2 

2) H, : x is normal with mean m ^ m and variance a 

1 o 

(known) 

Working on a given level a, the confidence limits of m are 
computed accordfig to (5.4.4). If the given value m Q falls 
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outside the confidence interval, it is said that m differs 
significantly from m Q on the a level and accordingly H q is 
rejected and is accepted. If the confidence interval 
includes the point m Q , it is said that no significant differ- 
ence has been found and the hypothesis H q is accepted. 

In the case when H Q is in fact true, this test gives 
a probability 1 - a of accepting the hypothesis and conse- 
quently the probability a of rejecting it. Thus the proba- 
bility of committing an error by rejecting the hypothesis 
when it is true is equal to the level of the test a. 

In order to apply this test, the sample characteristic 
x is found, and the quantity t computed, where 

/n (x - m ) 

t = - — (5.4.5) 


Denoting by a the desired level of the test, the value t^ 

is found from a normal distribution table. If I t I > t , the 

hypothesis is rejected on the level a. 

In the case when H is not true, but rather H n is true, 

o 1 

the probability of accepting the incorrect hypothesis based 

upon the above test is not 1 - a. This is because (5.4.5) 

does not have a zero mean unit variance normal distribution 

when m 4 m . However, the variable 
o 


t ' _ /n (x - m) 

a 


(5.4.6) 


does have such a distribution. Define 
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A = 


/n (m - m ) 
o 


( 5 . 4 . 7 ) 


Then 


t = t' + A 


The probability that the test variable t lies within 
the range |t| < t is 


P(-t < t < t ) = P(-t - A< t 1 < t -A) 
'a a a a 


Define 


$ = P (-t < t < t a ) = probability of accepting H q 

when is true 


Then 


6 =[ 

J 


t -A 

a 


f(u) du 


-t -A 

a 


where f (u) is the normal probability density function with 

* s 

zero mean and unit variance. Since f (u) is symmetrical 
about u = 0 , 


0 = £ 


t +A 
^ a 


f (u) du + 


L - (t a +A) 


/ 


t -A 

a 


f (u) du 


- (t -A) 

a 


Define 


t Q = t - A, t D = t + A 

3 X a B 2 a 


f 8 1 

I f (u 


) du, 3. 




f ( u ) du 
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Then 


6 = ±04 + fi 2 > 


and 


y = l- 0 = l- |(B 1 + 3 2 ) 


For -t < A < t , both 0, and 0 9 will be positive, while if 

06 0 6 A. Z 

A > t , 0, will be negative, and if A < -t , 0„ will be 

06 -L 06 Z. 

negative. It can be seen that 


e 1 = ± P ( | t | < |t B |) 


where the + sign is 
used when t ft is positive 


0 2 = ± P ( | t | < |t 0 |) 


with the same sign 
convention 


Thus the power of the test, y, is a function of m, m Q , 
a, a, and n. However, if y is plotted as a function of the 
nondimensional parameter A, the only free variable is a, the 
level of the test. Such a plot is shown in Figure 5.1 for 
a = .05, .10, and .20. 

Note that 



t 


-t 

t -A 


r a 


r “ 

r a 


0 = / f (u ) du 

+ 

f (u) 

du + 1 f (u) 


J -t 


✓ 

-t -A 

^ t 


a 


a 

a 




t 





r a 


and 


1 

- « - f 

f ( u ) du 




^ -t 





a 



-t 



t -A 


r a 


( 

- a 

so 

0 = 1 - a + / 
J 


f (u) du + 

f(u) du 


-t ■ 

a 

-A 

J 

t 

a 
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It can be shown that for any A ^ 0 , 


so 


-t 

I 


-t -A 

a 


f (u) du + f f(u) du < 0 

Jt a 


t -A 
a 


S < 1 - a 


for A ^ 0 


= 1 - a 


for A = 0 


In other words, the probability of accepting a false hypothe- 
sis is less than the probability of accepting a true 
hypothesis H q . As | A | increases, 3 decreases with the limits 
3 = 1 - a for A = 0 and 3 -»■ 0 as | A | -* 00 . 

Since 3 will ordinarily be small for large a, it follows 
that setting a larger will make for relatively more powerful 
tests of H o> The power curves shown in Figure 5.1 indicate 
that if a is set at .10 rather than .05, the test with 
a = .10 is more powerful than that for a = .05 over all 
possible values of m under H^. Making the probability of 
error in rejecting a true hypothesis larger has the effect 
of making the test more powerful. The proper value of a for 
any particular application depends upon the relative penalty 
paid for overlooking a true departure from H q versus reject- 
ing H q falsely. 

For a given m, m , and a, increasing the sample size, n, 
has the effect of increasing |a|, so that the power of the 
test is increased with increasing n. A similar increase in 
the test power could be achieved by reducing a; but in the 
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application under study, a is not a variable which is easily 
reduced, so that the only effective way to increase the test 
power is to increase a or n. 

Establishing confidence intervals in tests on the mean m 
with a unknown is similar to the previous work, except that 
the sampling characteristic and its distribution are somewhat 
different. 

In Section 5.2, it was stated that the variable 



has a Students' distribution with n-1 degrees of freedom, 
where x is the sample mean and s is the sample standard 
deviation. Let a denote a given fraction and t be the a 
percent value of t for n-1 degrees of freedom found directly 
from a table of the Students' distribution. By the defini- 
tion of t , 
a 

P(-t < >/n ~ 1 - — < t ) = 1 - a (5.4.9) 

a s a 

In the same fashion as before, the interval 


(x - t 


/n-1 


x + t 


(5.4.10) 


is a 1 - a confidence interval for m, the limits of the 
interval are confidence limits for m, and the corresponding 
confidence coefficient is 1 - a. 
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Testing the hypothesis that m has some given value, 
say m Q , for o unknown is quite similar to the test for a 
known. A decision is made concerning which of the following 
hypotheses is true based upon the observed data: 

1) E : x is normal with mean m = m 

o o 

2) : x is normal with mean m ^ m Q 

In order to apply this test, the sample characteristics 
x and s^ are found and the quantity t computed, where 

/n-1 (x - in ) 

t = (5.4.11) 


Denoting by a the desired level of the test, the value t^ 
is found from a Students' distribution table. If |t| > t^, 
the hypothesis H q is rejected on the level a. 

Define 

, A /rPT jx_ z nO (5.4.12) 

s 


and 


. /n-1 (m - m ) 

A A 2 _ 


(5.4.13) 


Then 


t = t' + A 


If H is false (H, is true), t defined by (5.4.11) does 
o 1 

not have a Students' distribution, but t' does have such a 
distribution. So 


5 ‘c 1 = P(_t a - A < f < t a - A) 
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Define 


3 = P (-t a < t < t a ) = probability of accepting H q 

when is true 

Then 3 = |(B 1 + & 2 ) 

where 3^ and are as defined before except f (u) appearing 
there is now the Students' distribution with n-1 degrees of 
freedom. 

As before, it is possible to construct a power curve 
versus A for a given a. Now, however, the curve is also a 
function of n, the number of sample values. 

As in the case with c known, 3 < 1 - a for any | A | > 0, 
so that the probability of accepting a false hypothesis is 
less than the' probability o'f accepting a true hypothesis, 
with 3^0 as | A | 00 . Increasing a results in a more 

powerful test of H q but also increases the risk of rejecting 
a true hypothesis . 

Figure 5.2 shows the power of the above test versus 
the nondimensional parameter A for a = .05 and .10, with 
n = 10. Figure 5.3 shows the power versus A for n = 5, 10, 20, 
with a = .10. 

In this section, the normal distribution and the Students' 
statistic have been used for drawing inferences on the unknown 
mean of a population from which observations are obtained. 

The distribution of the t-statistic defined by (5.4.11) is 
obtained after making the following assumptions : 

1) the distribution of the random variable x is normal 
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Fig. 5.2 Test Power vs A and a for Fixed n 
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A 


Figure 5.3 Test Power vs A and n for Fixed a 
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2) the observations are mutually independent 


3) the mean of the population is exactly m Q 

From the theoretical and empirical studies it is known 
that the t distribution is not sensitive to moderate depar- 
tures from normality so that its application is not strictly 
governed by the normality assumptions. A significant t may 
not, therefore, be interpreted as indicating departure from 
the normality of the observations. 

Suppose that all the observations are mutually corre- 
lated with a common positive correlation p for any two. Then 

9 2 

e [ (x - m Q ) ] = — (1 + (n-l)p) (5.4.14) 

e(s 2) = a 2 (1 - p) (5.4.15) 

Instead of the t-statistic (5.4.11) consider 

(n-1) (x - m ) 2 

t Z = 2 — (5.4.16) 

s 

which can be shown to have a F distribution on 1 and n-1 
degrees of freedom. From (5.4.14) and (5.4.15), the 

2 

expected values of the numerator and denominator of t are 

a 2 (1 - (n-1 ) p ) and JUi a 2 (1 - p) (5.4.17) 

n n 

The ratio of the expectations is unity when p = 0, but is 
greater than unity when p > 0 and + » as p -*■ 1. Thus a 
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large value of t is expected to occur when p is positively 

large, even when m Q is exactly equal to m. A significant t 

may therefore be due to a departure in assumption 2) . 

Finally, when the assumptions 1) and 2) are true and 

m ^ m Q , the ratio of the expected values of the numerator 

2 

and denominator of t of (5.4.16) is 

n (m - m ) 2 

5-2 — + 1 (5.4.18) 

cT 

compared with 1 when m = m^, so that large values of t occur 
when assumption 3) is wrong. This is exactly the reason why 
the t-test is used to test the null hypothesis concerning the 
mean of a distribution. 

In computing (5.4.17) the extreme case of mutual depen- 
dence with a common correlation p was considered. But in 
general, any dependence giving positive correlation to pairs 
of variables will increase the significance of t, so that the 
test will indicate any significant departure from assumption 2) 

5 „ 5 Tests on the Variance 

Let the variable x be normal with mean m and variance 

2 

a , where m and a are both unknown. Given n independent 

observed values x^,..,x n , a confidence interval for the 
2 

variance a is sought. In Section 5.2 it was stated that 
the variable 


X 


2 


n s 



205 



has a chi square distribution with n-1 degrees of freedom. 

For any given level of test a, infinitely many intervals can 
be found, each of which contains exactly the area 1 - a in 

this distribtuion. Among all these intervals, the particular 

2 2 2 2 
interval (x /X ) is chosen, where x and X™ are the a, 

1 2 2 1 2 1 
and a 2 values of the x distribution for n-1 degrees of free- 
dom, where 


a 


1 



a. 


1 

2“ 


2 

Each of the tails x 
, and thus 


< 



and 


2 

X 


> 



contain equal area 


1 a 


T'* / ^ v ns t i 

P(X„ < — < X„ ) = 1 - a 


2 

" a , 


(5.5.1) 


By a simple transformation, (5.5.1) can be written as 


ns 


v a. 


< a 2 < 

V 


= 1 - a 


(5.5.2) 


Thus the interval 


ns \ 


. 2 

is a 1 - a confidence interval for a , the limits of the 
. 2 

interval are the confidence limits for c , and the corres- 
ponding confidence coefficient is 1 - a. The confidence 

interval (5.5.3) provides a rule for estimating the parameter 
2 

o , which is associated with a constant risk of error equal to a. 
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2 

Testing the hypothesis that a has some given value, 
say a 2 , is analogous to the tests of the mean given in the 
previous section. In this case a decision is made concern- 
ing which of the following hypotheses is true: 

2 2 

1) H : x is normal with variance a = a Q 

2 2 

2) H^: x is normal with variance a j- o q 

In order to apply the test, the sample characteristic 

9 2 

s must be found and the quantity x computed, where 


2 


X 



(5.5.4) 


Denoting by a the desired level of the test, the values 
X 2 and x 2 are found from a chi square distribution table 

Ot 1 Oi « 

1 ^ 2 2 2 
with n-1 degrees of freedom. If x a < X < X a > 

1 2 

hypothesis H o is accepted on the level , otherwise H q is 

rejected and accepted. In the case when H o is in fact 

true, this test gives a probability of 1 - a of accepting 

the hypothesis and consequently a probability a of rejecting. 

Thus the probability of rejecting H q when it is true is equal 

to the level of the test, a. 

In the case when H is not true (thus H, is true) , the 

o i 

probability of accepting the incorrect H q is not 1 - a. 

This is because (5.5.4) does not have a chi square distribu- 
tion when a ¥ a . However, the variable 
o 

x 1 2 = EL_|i (5.5.6) 

a 
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does have a chi square distribution with n-1 degrees of 
freedom. Define 


. c 2 
„ A o 


( 5 . 5 . 7 ) 


Then 


,2 2 

x = nx 


and the probability that the test variable lies within the 

range xi? < X 2 < is 

U 1 2 


P( Xa x < X 2 < ^a 2 ) - < X ' 2 < nx^) 


2 2 2 

Define 8 = P (x < X < X ) = probability of accepting H 
cl a 2 o 

when is true 


■r 


a . 


Then 3 = / f (u) du 

2 


where f (u) is the chi square distribution with n-1 degrees 

of freedom. It should be noted that unless n = 1, the area 

2 2 

under the tails of f (u) for u < nx and u > nx are not 

a 2 

equal. Define 


.nx. 


/ (X 

1 f (u) du = P (x 1 2 < nx 2 ) 


-r 


f (u ) du = 


P(x ' 2 < nx 2 ) 

a 2 
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Then 


* = e 2 - B 1 
and Y = 1 “ 3 

It is again possible to construct a power curve versus n for 
a given a, the curve also being a function of n, the number 
of sample values. Such curves are shown in Figures 5.4 and 

5.5. 

5 . 6 Multidimensional Hypothesis Tests with Time Varying 
Population Parameters 

In the preceding sections, hypothesis tests on the time 
invariant parameters of the distribution of a scalar random 
variable were discussed. The results can be generalized to 
include tests on vector random variables with time varying 
parameters. First the case of vector random variables with 
constant population parameters will be discussed. 

Let X be a r x 1 random variable with density function 
f (X) and consider n independent repetitions of a random 
experiment to which X is attached. The resulting observed 
values of X are denoted X-^,..,X . The sample mean is 
defined by 

n 

X = - Y X- (5.6.1) 

n L-. x 

i=l 

and the sample covariance is defined by 

n 

S 2 = i 'Y (x i " x > (x i " X)T (5.6.2) 

i=l 
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Let the population characteristics of the random variable 
X be 

e(X^) = M for i = 

e [ (X i - M) (X i - M) T ] = P 
Then the sample mean is a random variable with 


e (X) = M 


£ [ (X - M) (X - M) T ] = i P 


and the sample covariance is a random variable with 


e (s 2 ) 


n-1 


n 


As in the preceding sections, it will be necessary to 
introduce further assumptions about f (X) in order to obtain 
more precise results about the properties of the sampling 
distributions. The case of interest is when f (X) is a 
multidimensional normal distribution. 


f (X) 


(2,r)r/2|p|l/2 


-i[(X-M) T P 1 (X-M)] 
e 


It is also assumed that the n observations X^ are. indepen- 
dent, so 
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■min i i n 


n 

f (x..,..,x ) = TT f (X.) 

x n i=l 1 


It can be shown that if the n observations are independent 
normal variables with population mean M and covariance P, then 


1) X is a r x 1 dimensional normal variable with mean 
M and covariance P/n. 

T 2 
L x S L 

2) for any fixed vector L, n — is a central chi 

L P L 

square distributed random variable with n-1 degrees 
of freedom. 

— 2 . . 

3) X and S are independently distributed. 

T — 

4) t = n - L for any fixed L, is a Students' 

Vl t s 2 l 

distributed random variable with n-1 degrees of 
freedom. 


Comparing these four results with those of Section 5.2, 
tests of hypotheses and confidence intervals on a vector 
random variable can be handled in the same fashion as a 
scalar random variable. If the mean and variance of each 
component of the random variable X are to be tested, the 
proper choice of L for each test is 


0 


L . 

3 


0 

1 

0 




. th 
3 


component not zero 
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Instead of a single confidence interval and test of the mean 
and variance, there will now be r such intervals and r tests 
on the mean and variance. The power of such tests under 
deviations from the null hypothesis can be computed in a 
manner entirely analogous to that of the previous sections. 

The sample mean and covariance are not the only sample 
characteristics which can be used to test hypotheses on the 
distribution of X. Below are discussed alternative sample 
characteristics which might be used and in many applications 
they will provide sufficiently powerful tests. 

Consider the random variable 

Y i = c < x i “ M ) 



T -1 -1 

such that C C = P and C exists. 

Such a C can always be found because P is positive definite. 
Then 

e (Y. ) = 0 
i 

T 

e (Y^Y^) - 1 the identity matrix 

The elements of are independent, zero mean unit variance 
normally distributed variables, and 
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u. 

1 


r 



j-1 


where Y ■? is the j ^ 

1 J 

element of the vector Y. 

1 


is a zero mean normal variable with variance p. Since Y_^ 
is independent of Y k for i ^ k, U i is independent of U k for 
i 7 ^ k, and 


n 



i=l 


is a zero mean normal variable with variance — — . 
Define 


W. = yTy. (5.6.3) 

l li 


Since the Y^ are zero mean unit variance normal variables, 

VL is a central chi square variable with r degrees of freedom, 
with independent of W k for i ^ k. Then 

n 

■A £ W± (5.6.4) 

i=l 


is a central chi square variable with n r degrees of freedom. 
Now consider the variable 


Y! = C(X. - X) = Y. + C(M - X) 

l l l 
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and define 


W! = Y! T Y! 

1 11 

= yTy ± + (X-M) T p“ 1 (X-M) - 2 yTc(X-M) 


n 

Y = - V Y . = C (X - M) 
n / i l 

i=l 


Then 

m rp T 1 *— 

W! = YTY. + Y Y - 2 YTY 

ill 1 

(5.6.5) 


n n 


Define 

z ' - Z w i = Y. (Y I Y i - yT?) 

(5.6.6) 


i=l i=l 



It can be shown that Z' is a central chi square variable with 
(n-l)r degrees of freedom, and that Z' is independent of U. 

Since n r U is a zero mean unit variance normal variable 
and Z' is a central chi square variable with (n-l)r degrees 
of freedom, and U is independent of Z', the variable 


t _ r /n(n-l) U 

ZIP’ 


is a Students' distributed random variable with (n-l)r 
degrees of freedom. Define 


n 


= Z ' = V (yTy. - Y T Y) 

nr n r Z_, i i 

i=l 


Then 


t = 


/ (n-1) r U 


(5.6.7) 


(5.6.8) 


After some manipulation, it can be shown that 
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1 

n r 


(5.6.9) 


2 

s 


n 

£ (X i - X) T P~ 1 (X i - X) 
i=l 


By use of the sampling characteristics (5.6.8) and (5.6.9), 

the null hvpothesis that M = M and P = P can be tested. 

•* o o 

The proper test variables for this test are 


t = / (n-1) r U 
s 

n 

and nr s^ = ^ (X^-X) T P q ^ (X^-X) 

i=l 

r 

where U = i 

j = l 

Y = C (X - M ) , C = V P -1 
o o o v o 

Under the null hypothesis, it has been shown that t has a 

2 

Students' distribution and nr s has a chi square distribu- 
tion. It should be noted that unlike the tests of hypotheses 
about the distribution parameters of scalar normal variables, 
a mean test using (5.6.10) does depend upon the hypothesized 

value of the covariance P . Unless M = M and P = P , t does 

o o o 

not have a Students' distribution, and a significant t could 
arise from a departure from the hypothesis M = M q or P = P q 
or both. However, it can be shown that t is not highly 
sensitive to departures from the hypothesis P = P , so that 
a significant t can be used to reject the hypothesis M = M q 
alone, especially if the covariance test either accepts or 
does not strongly reject the hypothesis P = P . While the 


(5.6.10) 

(5.6.11) 
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mean test depends somewhat upon the covariance hypothesis, 

it can be seen that the covariance test does not depend upon 

the mean hypothesis. The covariance test variable (5.6.11) 

has a chi square distribution if P = P Q , regardless of 

whether M = M . 

o 

The mean and covariance test variables (5.6.10) and 
(5.6.11) can be used to test the hypotheses outlined above 
in a fashion similar to that of Sections 5.4 and 5.5, as 
long as caution is employed in interpreting the results of 
such tests. 

Now consider the case of vector random variables with 
time varying population parameters. The case of interest is 
when the population mean is time invariant, but the population 
covariance varies with time. Then the population character- 
istics of the random variable X are 


e (X . ) = M 

l 


£ [ (X . - M) (X. - M) T ] = P . 

l l l 


The sample mean is a random variable with 


e (X) = M 


e [ (X - M) (X - M) T ] = Jp 


where 


n 


p. 

n Z_. 1 

i=l 
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and the sample covariance is a random variable with 

e (S 2 ) = — p 
n 

As before it will be assumed that the are independent 
normal variables with the population parameters given above. 

Because of the time varying population parameters, it 
will be necessary to utilize normalized variables in order 
to obtain the sampling distribution of certain sampling 
characteristics. Consider the variable 


where 


Y. = C. (X. - M) 

1 11 



such that 


T -1 -1 

C.C. = P. and C. exists. 
ii l l 


Then 


eCY 1 ) = 0 


£ (Y i i } = 1 


( 5 . 6 . 12 ) 


The elements of are independent zero mean unit variance 
normal variables, with independent of Y^ for i ^ j. 
Define 


n 



i=l 


S’ 2 = i Y (Y. - Y) (Y - Y) T 

n / . l i 

i=l 
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After some manipulation, it can be shown that 


n 


i=l 


s' 2 = - y t(C.X.-CX) + M(C-C.)] [ (C.X.-CX) + M(C-C . ) 3 T 
n4_.1l l ii l 

c k I f c. 

n L-> x 

i=l 
n 

cx = - > c.x. 

n 4 _J 11 


(5.6.13) 


where 


i=l 


It can be shown that 


1) Y is a r dimensional zero mean normal variable with 
covariance I/n. 


2) for any fixed L, 


n L S 


To. 2 , 


T 

L L 


is a central chi square 


distributed variable with n-1 degrees of freedom. 


2 

3) Y and S' are independently distributed. 

4) for any fixed vector L, 


^ n ~^~ ■ L - Y is a Students’ 


Vl T S ' 2 l 

distributed variable with n-1 degrees of freedom. 


The hypothesis that M = M q and P i = P oi can be tested in a 
fashion analogous to the tests outlined in this section for 
time invariant distribution parameters, except that now the 
test variables are 


t . 
3 


/n^I lTy 

2_ 



j=lf • • ,r 


(5.6.14) 
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(5.6.15) 


where 


X-, = 


n lTs ,2 L. 

_J 1 

T 

L.L . 

3 J 


n 


,2 


- Y (Y.-Y) (Y.-Y) T 
n 4 -i i i 

i=l 


n 


Y = 


- y c . (x . -m ) 

n l_, oi i o 


i=l 


'oi ' "^ P oi 


It can be seen that both (5.6.14) and (5.6.15) are 

functions of the sample values and the values of M q and 

and P. unless P. is time invariant. Therefore tests using 
10 10 

these test variables do not provide independent tests of 
the mean and covariance. However, both variables can be used 
for testing the hypothesis M = M q and P^ = P^ Q , where and 
P^ o are specified values. Rejection of the hypothesis by 
either test can imply that M ^ M q or P^ ^ P^ o or both. 
However, even though the tests are not independent, it can be 
shown that the mean test is more sensitive to departures from 
the mean hypothesis than from the covariance hypothesis, and 
conversely for the covariance test. 

As in the case of time invariant population parameters, 
the sample characteristic (5.6.14) and (5.6.15) are not the 
only characteristics which might be used to test hypotheses 
on the mean and covariance of X. In a manner analogous to 
the previous work, define 
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s 2 = —p Y, (Y i Y i “ yTy) (5.6.16) 

i=l 


t 


/ (n-1) r U 
s 


(5.6.17) 


where here (X^ - M) 


n 



i=l 


r 



i=l 


2 .... 

As before, nr s has a chi square distribution with n-1 

degrees of freedom, and t has a Students' distribution with 
n-1 degrees of freedom. By use of these sampling character- 
istics, the null hypothesis M = M and P. = P. can be tested. 
The proper test variables for this test are (5.6.16) and 
(5„6.17) with replacing M and P^ q replacing P^ . The two 
tests are not independent tests of the mean and covariance 
so caution should be employed in interpreting the results 
of such tests. 


5 . 7 Application of Hypothesis Tests to Maximum Likelihood 
State Estimation 

In Chapter 2, the recursive maximum likelihood state 
estimation equations were derived for a linear dynamical 
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system. These equations were derived under the following 
assumptions : 

1) the measurement and driving noises are independent, 
zero mean normal variables. 

2) the covariance of the noises are known precisely. 

3) no computational errors are made. 

4) all of the parameters describing the dynamical 
system and the linear measurement are known precisely. 

If all of these assumptions are valid, it can be shown that 
the measurement residual at any time k 

fiz k ■ z k ■ H k x kjk-i 

is a zero mean normal variable, independent of the residuals 
at times other than k, with conditional covariance 


B, = R, + 
k k 


H k P k|k-l H k 


where Az, and B, are computed using values of the noise 

.K K 

covariance parameters assumed known by the previous assumptions . 

It can be seen that the variable Az k is just such a 
variable upon which the tests of the mean and covariance 
given previously can be applied. Which set of tests is 
applied depends upon: 
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1) computational limitations 

2) desired power of the tests 

3) the dimension of the residual 

4) whether the estimation equations have reached a 
steady state such that is approximately constant, 
so that the population parameters of the residuals 
are time invariant 

5) the need to isolate which component of the residual 
satisfies or violates the underlying assumptions 

If the residuals fail the hypothesis tests, the tests 
themselves do not tell why, but merely indicate that one or 
more of the assumptions is probably in error. It is up to 
the analyst to isolate which of the assumptions is likely 
to be in error and make adjustments in the assumptions until 
the residuals pass the required hypothesis tests,, 

If all four of the previous assumptions are considered 
as the null hypothesis to be tested, it is most difficult to 
compute the power of the tests under deviations from the null 
hypothesis. In order to compute the test power, the distri- 
bution of the sample characteristics under deviations from 
the null hypothesis must be found. This is very difficult 
to do for very general deviations from the null hypothesis. 
Only when possible deviations from the null hypothesis are 
relatively simple, say errors in the covariances of the 
noises, can power of the test be computed. Even then, when 
the residuals are vector valued with time varying conditional 
covariance, the computation of the test power is most 
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difficult. However, even if the test power is not known 
accurately, it can be expected that the tests will indicate 
significant deviations from the null hypothesis, which is 
the primary purpose of such tests. 

The measurement residual is not the only observable 
random variable upon which hypothesis tests can be based. 
Consider the situation when the values of R and 0 used to 
compute the state estimation weighting matrices may be in 
error. It is desired to test a hypothesis concerning the 
values of these parameters. In Chapter 3, it was shown that 
the score 


S n<V a > 


3L (Z ,ct ) 
n n 

8a 


evaluated at the true value of the parameters a is asymp- 
totically normal with zero mean and covariance J (a ) , 

J no 

where a is the true value of a. In the case of state and 
o 

noise variance estimation, the parameter set a consists of 
the state x n and the vector £ T = (R 11 , . . , R YY , Q 11 , . . ,0^ n ) . 
If the score is computed as a function of the measurements 
and the a priori values of R and Q, a large score will 
indicate that the a priori values of R and Q are probably 
in error. 

Only those components of the score corresponding to 
differentiation by £ are useful in testing the hypothesis 
on R and Q because it was seen that 


S 


x 

n 


A 


3L (Z , a) 
n n 


8x 


n 


(5.7.1) 
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evaluated at the estimated value of x n is identically zero, 
regardless of the true value of R and Q. However, the 
quantity 


S 


5 A 
n 


9L (Z a) 
n n 

H 


(5.7.2) 


is a useful indicator for testing the hypothesis. If the 
null hypothesis R = R q and Q = Q q is true, then is 
asymptotically a zero mean normal variable with covariance 


cov (S 


n 


e 


ri8L | 

T 9L I 

n 

S. lx £ 

. H , 

11T 1 n'^J 


4 

n 


(a) 


There are two functions of the score (5.7.2) which 
might be used for hypothesis testing. Define 


where 


t = C (Z ,£) 

n n n' 


c n 


Then each component of the vector t is asymptotically an 
independent zero mean unit variance normal variable. Tests 
on t can be conducted using the results of tests on the mean 
of a random variable with known variance. A significant t 
will indicate that one or more elements of the a priori 
values of R and Q are in error. 
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Another possible test variable is defined by 


2 

X = 



(J ? ) 1 
n 



2 

It can be shown that under the null hypothesis, x is asymp- 
totically a central chi square random variable with r degrees 
of freedom, where r is the dimension of the vector E , . Tests 
on this variable can be conducted using the results of tests 
on the variance previously outlined. 

It is difficult to assess the relative power of these 
two tests under deviations from the null hypothesis. Even if 
the distribution of the score under deviations from the null 
hypothesis could be found, the power of the tests could be 
found only after great computational expense. However these 
tests do have the distinct advantage of using test parameters 
which allow a determination of the first linear correction in 
the a priori values of R and Q if the hypothesis test fails, 
using the results of the linearized maximum likelihood solu- 
tion discussed in Chapter 4. 
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Chapter 6 
NUMERICAL RESULTS 


6 . 1 Introduction 

Theoretical results about various techniques for esti- 
mating noise covariance parameters and testing statistical 
hypotheses have been developed in the preceding chapters. 

This chapter is devoted to a discussion of the results of a 
digital computer simulation of the equations derived. The 
purpose of this simulation is twofold. The theoretical 
results must be checked to ensure that they accurately portray 
the situation. Once the validity of these results is 
established, a numerical comparison of the various techniques 
for estimating the noise covariance parameters will be made 
to determine the trade-offs involved In using simpler but 
less accurate methods of estimation. 

The principal theoretical results that are to be checked 

are : 

1) convergence of the iterative maximum likelihood 
solution 

2) the unbiasedness of the maximum likelihood solution 

3) comparison of the actual mean squared estimation 
error of the maximum likelihood solution with the 
inverse information matrix 

4) the range of applicability of the linearized 
maximum likelihood solution 
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5) convergence of the near maximum likelihood solution 

6) comparison of actual mean and mean squared estimation 
error of the explicit suboptimal solution with the 
theoretical expressions for the quantities 

7) the sensitivity and power of hypothesis testing in 
realistic situations 

The system simulated was purposely made simple. Many 
of the estimation equations are very complex and require 
iterative solutions. Only by limiting the complexity of the 
system and the number of parameters to be estimated could 
the required computations be kept within reasonable limits. 

In checking the above theoretical results, Monte Carlo 
simulations are required. Many trials are required in which 
actual noises and realistic parameter values are simulated, 
this being a time consuming and expensive procedure. However, 
once the theoretical results are established, then Monte Carlo 
simulations are not required, thus allowing statistical simu- 
lation in which only the expressions for the mean and mean 
squared error of the estimates are computed, resulting in the 
ensemble average of the results that would be obtained if a 
large series of Monte Carlo simulations were performed. 

6 . 2 Description of System and Measurement 

The system simulated is a second order damped oscillator 
with time invariant damping ratio and natural frequency, 
driven by stationary zero mean uncorrelated normally distri- 
buted noise. The state of the system is defined as a two 
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component column vector of the position and velocity coor- 
dinates of the system. 


x 


n 


$ (n,n-l)x n . 1 


+ r 


n 


w 

n 


where x is the state at time "n" 

n 

x n is the state at time "n-1" 
n-1 

$(n,n-l) is the 2x2 state transition matrix 

w is either a scalar or 2 x 1 column vector 
n 

driving noise 

T n is either a2xlor2x2 forcing function 
matrix 

Q is the driving noise covariance matrix 

The state transition matrix obeys the differential 
equation 

d$(t,t ) 

35-2-. Fit) t(t,t 0 ), *<t 0 .t o > = i 


For a second order oscillator with time invariant parameters. 


0 


F = 



1 

-2L, 


where £ is the damping ratio and S2 is the system natural 
frequency . 
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The measurement of the state is either a scalar or a 


2x1 column vector defined by 


n 


= H x + v 
n n n 


where H isalx2or2x2 time invariant measurement 

n 

matrix 

v^ is a scalar or 2x1 column vector measurement 
noise 

R is the measurement noise covariance matrix 

In the simulation of variance estimation, the values of 
the diagonal elements of the measurement and driving noise 
covariance matrices are chosen from a Gamma distribution as 
described in Chapter 3. 


6 . 3 Effect of Incorrect Noise Covariance Parameters Upon 

Maximum Likelihood State Estimation 

In Section 2.3 equations were derived for the evaluation 
of the performance of a maximum likelihood state estimator 
when incorrect values of the measurement and driving noise 
covariance matrices are used in the computation of the mea- 
surement residual weighting matrices. It was shown that even 
if incorrect values of the noise parameters are used, the 
maximum likelihood estimator remains unbaised. However, the 
covariance of the estimation error is a function of the errors 
in the noise parameters. In this simulation the "true" and 
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"computed" state covariance matrices are calculated as 
functions of the true and assumed values of the measurement 
and driving noise covariance matrices. 

From (2.3.39) and (2.3.42), the computed state covari- 
ance matrix obeys the recursive relationships 


* * * * «p * * *rp 

P I = (I— A H )P | . ( I-A H ) + A R A 
n n n n n n-1 n n n n 


F n|n-1 " 4 ’ (n ' n - 1)P n-l|n-l J ’ T<n ' n - 1) + r„Q* 


* * rn* * nn 1 

A = P I , Pi (R + H P | , H ) 

n n n-1 n n n n-1 n 


r 


T 

n 


where R and Q are the assumed values of the measurement and 
driving noise covariance matrices. 

From (2.3.43) and (2.3.44), the true state covariance 
matrix obeys the recursive relationships 


* * rn * *rp 

P | = ( I— A H )P | , (I-A H ) + A R A 

n n n n n n-1 n n n n 


P n I n-1 - ♦<n,n-l)P„_ 1 | n _ 1 + r n Q 


where R and Q are the true values of the measurement and 
driving noise covariance matrices. It is assumed that 

* 
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The following graphs show the variation in the trace of 
the true and computed covariance matrices after the last 
measurement as a function of the estimated values of R and Q. 
For simplicity the measurement and driving noises are scalar 
random variables and the time interval between measurements 
is constant. The parameters of the system and the measure- 
ments are: 

S = .05, Q = . 1 rad/sec 

h = (1,0) , r T = (o,i) 

n ' n 

Time between measurements = 1 sec 
Total number of measurements = 200 


P | 
o o 



0 

10 


For the values of the system and noise parameters chosen. 


the covariance equations reach a steady state after 


approximately 10 measurements. 


It should be noted that each of the two diagonal 

■k k 

elements of P i had the same general variation with R 
n | n 

k k 

and Q as the trace of P n j n - For simplicity, only graphs 

* * * 

of the trace of P i versus R and Q are shown. 

n n 








[tip 




gEMH 




As can be seen, the trace of the true state estimation 
error covariance matrix P n | n is not highly sensitive to 
errors in the estimated values of R or Q. This means that 
the estimation error is not highly sensitive to use of 
incorrect values of R and Q in the computation of the measure- 
ment residual weighting matrices. However, the trace of the 

* 

computed state estimation error covariance matrix | n is a 
strong function of errors in R and Q. This means that for 
moderate errors in R and Q, the computed covariance matrix 
is a poor representation of the actual state estimation error 
covariance. Although the actual error covariance may be 

ic ie 

small, there is no way to know this unless R and Q are very 

near the true values of R and Q. Therefore any decision made 

* 

about the probable state estimation error using P n | n may be 

* 

incorrect due to the large difference between Pi and Pi. 

n | n n | n 

6 . 4 Compa rison of State and Noise Covariance Estimation 
Procedures 

Four procedures for estimating the state and noise 
covariance parameters are simulated and compared: 

1) maximum likelihood 

2) linearized maximum likelihood 

3) near maximum likelihood 

4) explicit suboptimal 

The simulations are divided into two parts, Monte Carlo 
simulations and statistical simulations. 
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Maximum Likelihood and Linearized Maximum Likelihood 


The equations for the simultaneous estimation of the 
system state and noise covariance parameters with a priori 
information about the state and noise covariance were solved 
by the iterative procedures of Section 3.6. In Chapter 3, 
it was shown that the asymptotic distribution of the R and Q 
estimation error is a zero mean normal distribution with 
conditional covariance W n (?) , where W n i£) is a submatrix of 
the conditional information matrix, and 


r = 


(R 11 , . . ,R YY ,Q i:L , 


.,Q nn ) 


w 1 (?) 

n 


= e 


A T 7\ 

9L 9L 
n | n 


3 ? 


3 ? 


In an actual situation, the above matrix cannot be computed 
because the true value of ? is unknown. However, it is 
usually a good approximation to compute W n "*" at the estimated 
value of ? if a measure of the R and Q estimation error 
covariance is desired. All evaluations of the conditional 
information matrices in this section are at the true value 
of ?. 

In the case of scalar R and Q, W (?) is a 2 x 2 matrix 

’ n 

with elements 


W (?) 
n 


e[ (AR) 2 ] 
e[ (ARAQ) ] 


£ [ (ARAQ) ] 
e[ (AQ) 2 ] 
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where AR and AQ represent the R and Q estimation error. The 
square root of the appropriate diagonal element of w n (5) is 
the standard deviation of the corresponding noise covariance 
parameter asymptotic estimation error. The normalized esti- 
mation error can then be defined by 


e 


R 



(6.4.1) 


where R is the estimate of R on a given trial, R is the true 
value of R on that trial, and cr~ is the standard deviation of 

IV 

the error as given above. A similar expression is used to 
define the normalized Q estimation error. 

In order to check the theoretical unbiasedness and 
covariance of the estimates, the mean and mean squared error 
of the estimates over the ensemble of trials is computed and 
compared with the mean of the true values of R and Q and the 
average conditional information matrix. The average R is 
defined by 

K 

ave(R) = | ^ Rj (6.4.2) 

j=l 

th 

where R^ is the value of R on the j trial and K is the 

number of trials. A similar expression is used to compute 

/\ 

ave(Q). The average of R is defined by 

K 

ave(R) = i ^ R. (6.4.3) 

j=l 3 
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trial. A similar 


where R . 

1 


/\ 

is the value of R on the 


. th 
3 


expression is used to compute ave(Q). 

The theoretical mean squared estimation error, averaged 
over the ensemble of all possible R and Q, was given in 
Chapter 3 by 


E[(5 - K) (S - 


C) T ] = 


w 


n 




W (£) 
n 


f(5) d£ 


As was noted, the above integral is difficult to compute. An 
easier to compute and possibly better measure of the average 
conditional covariance over the ensemble of values of £ 
actually experienced in the trials would be 

K 

"n “ll VV <6 ' 4 - 4) 

j=l 

where £^ is the value of £ on the j trial. The actual 
mean squared estimation error matrix is defined by 

K 

I Z <«j - £ j )( 5j - «j> T (6.4.5) 

j=l 

Similar expressions are used to compute the mean and 
mean squared estimation error of the linearized solution of 
the likelihood equations. The conditional information matrix 
associated with the linearized solution is computed at the 
a priori values of R and Q. If these values are not close to 
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the true values, the information matrix computed at the a priori 
values may not accurately represent the inverse of the esti- 
mation error covariance. However, it represents the best 
measure of the estimation error covariance that would be 
available for a real time solution without having to recompute 
the information matrix at the linearized estimates of R and Q. 

Figure 6.5 and Table 6.1 give the results of a ten 
sample Monte Carlo simulation. The system and measurement 
parameter values are those previously given, while the true 
values of R and Q are different on each trial. The values 
are selected from a Gamma distribution with 

E (R) = R = 1.0, E (Q) = Q = 0.5 
E[(R - R) 2 ] = 1.0, E[(Q - Q) 2 ] = 0.25 

If there was no estimation error, the estimates of R 

/N /S 

and Q would lie along the diagonal line R = R and Q = Q. 

The dispersion about this line is a measure of the estima- 
tion error. 

Shown in Table 6.1 are the standard deviation of the R 
and Q estimation error and the normalized estimation error 
defined by (6.4.1). Also shown are the results of the 
linearized maximum likelihood solution. The estimates of R 
and Q are those obtained on the first iteration of the optimal 
solution. As described in Section 4.2, the linearized solu- 
tion represents an estimate of R and Q that can be obtained 
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in real time. The initial estimates of R and Q were equal 
to the means of their respective distributions. As can be 
seen, the linearized solution is quite close to the iterative 
solution even for large departures of the true values of R 
and Q from the initial estimates which were used to compute 
the score and conditional information matrices. 

Figure 6.6 and Table 6.2 show the results of another 
set of ten Monte Carlo trials with a different set of random 
numbers used to simulate the noises and a different set of 
values of R and Q chosen from a Gamma distribution with 

E(R) = 10, E(Q) = 1 
E[(R - R) 2 ] = 100, E [ (Q - Q) 2 ] = 1 

Again the actual mean and mean squared estimation error 
over the ensemble of ten trials were computed and compared 
with the theoretical results. It can be seen that the mean of 
the estimates compares quite well with the mean of the actual 
values of R and Q. However, there is a rather large differ- 
ence between the theoretical and actual mean squared estimation 
error matrices. This is a hazard of trying to compute ensemble 
statistics on the basis of ten samples. Almost all of the 
actual mean squared R estimation error comes from Sample 7, the 
error being nonrepresentative of the expected error. The actual 
error was a 2.57 sigma error based upon the standard deviation 
obtained from the conditional information matrix. Omitting this 
sample from the ensemble averages results in good agreement be- 
tween theoretical and actual mean squared errors. 
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Table 6.1 Monte Carlo Run 1: Maximum Likelihood 

and Linearized Maximum Likelihood Solutions 


Sample 

R 

M.L. 

1 

0.860 

0.979 

2 

1.599 

1.461 

3 

0.522 

0.384 

4 

1.288 

1.276 

5 

0.526 

0.404 

6 

0.102 

0 . 096 

7 

0.304 

0.291 

8 

1.718 

1.470 

9 

0 . 160 

0.188 

10 

0.484 

0.523 

Average 

. 756 

. 700 


°R 

A 

e R 

Lin . 

0.118 

+1.014 

0.978 

0.207 

-0.670 

1.449 

0 . 074 

-1.876 

0.391 

0.165 

-0.074 

1.286 

0.092 

-1.322 

0.394 

0.021 

-0.276 

0.096 

0.047 

-0 . 288 

0.317 

0.200 

-1.386 

1.489 

0.052 

-0 .807 

0.128 

0.092 

+0.425 

0.514 

0.107 

-0.526 

.704 


Theoretical Mean Squared Estimation Error 
Linearized Iterative 


0.0184 -0.0039 


0.0152 -0.0035 

-0.0039 0.0101 


_- 0 . 00 3 5 0.0163, 


Actual Mean Squared Estimation Error 
Linearized Iterative 


0.0121 -0.0034 1 


0.0133 -0.0034 

,-0.0034 0.0233, 


,-0.0034 0.0204, 
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Table 6.1 (Continued) Monte Carlo Run 1 


Sample 

Q 

M.L. i 

1 

0.456 

0.445 

2 

0.512 

0.484 

3 

0.330 

0.447 

4 

0.360 

0.498 

5 

0.997 

1.286 

6 

0 . 327 

0.308 

7 

0.329 

0.328 

8 

0.145 

0.128 

9 

1.487 

1.194 

10 

1.249 

1.242 

Average 

. 619 

. 641 


Q 
O > 

S Q 

Lin. i 

0.093 

-0.115 

. 450 

0.111 

-0.260 

. 483 

0 . 066 

+1.770 

.463 

0.079 

+1.740 

.483 

0.174 

+1.665 

1.264 

0.056 

-0.345 

.362 

0.062 

-0.009 

o 3 36 

0.037 

-0.480 

.120 

0.221 

-1.325 

1.133 

0.209 

-0.004 

1.214 

0.111 

+0.264 

.636 
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Table 6.2 Monte Carlo Run 2: Maximum Likelihood 


and Linearized Maximum Likelihood Solutions 


A 


Sample 

R 

M.L. R 

1 

8.231 

7.584 

2 

8.104 

6.176 

3 

10.359 

8.369 

4 

5.362 

5.838 

5 

14.526 

17.012 

6 

11.753 

10.514 

7 

37.523 

27.297 

8 

29.982 

28.551 

9 

10.499 

11.929 

10 

19.663 

20.205 

Average 

15.599 

14.349 


A 



A 

e R 

Lin. R 

1.120 

-0.606 

8.882 

1.013 

-1.900 

6.303 

1.123 

-1.770 

8.200 

0.558 

+0.855 

5.283 

1.780 

+1.400 

16.907 

1.375 

-0.900 

10.522 

3.980 

-2.570 

28.399 

3.290 

+0.436 

30.378 

1.316 

+1.090 

11.942 

2.135 

+0.254 

18.914 

1.769 

-0.371 

14.543 


Theoretical Mean Squared Estimation Error 
Linearized Iterative 


1.420 -0.049 


4.191 -0.092 

_-0 . 049 0 . 063_ 


.-0.092 0.163, 


Actual Mean Squared 
Linearized 
10.152 0.179 

0.179 0.274 


Estimation Error 
Iterative 
~12 .49 4 0.425 

0.425 0.115 
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Table 6.2 (Continued) Monte Carlo Run 2 


✓N 


Sample 

Q 

M.L. Q 

°S 


Lin. i 

1 

3.985 

3.237 

0.828 

-0.093 

2.784 

2 

1.677 

1.461 

0.385 

-0.562 

1.422 

3 

0.089 

0.044 

0.029 

-4 

-1.530 

0.088 

4 

0.002 

0.002 

9 x 10 

+0.000 

0.046 

5 

2.476 

2.833 

0.582 

+0.614 

3.164 

6 

0.923 

1.047 

0.750 

+0.165 

1.048 

7 

0.614 

0.786 

0.188 

+0.915 

0.637 

8 

0.906 

0 . 878 

0.260 

-0.017 

0.592 

9 

2.361 

2.895 

0.537 

+0.995 

3.134 

10 

0.255 

0.211 

0.080 

-0.547 

0.455 

Average 

1.329 

1.339 

0.334 

-0.087 

1.337 


245 


Runs 3 and 4 and the corresponding Tables 6 . 3 and 6 . 4 
are the results of the above two runs repeated except that 
the values of R and Q are held fixed at the same values on 
each trial. Different random numbers were used to simulate 
the measurement and driving noises. These runs simulate an 
ensemble of trials with a fixed value of £ , so that the 
conditional information matrix is the same for each trial. 

Then the theoretical mean squared estimation error is given 
by w n (? 0 )' where is the value of E, for every trial. 

The agreement between the sample mean and mean squared 
estimation error and the theoretical results is quite good 
for both runs. A better correspondence between theoretical 
and actual results is expected in these runs than in the first 
two runs because the ten trials in each of these runs are 
samples from an ensemble of trials with different noises but 
with the same noise covariances. The first two runs were 
samples from an ensemble with different noises and different 
noise covariances. 
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Table 6.3 Monte Carlo Run 3: Maximum Likelihood 



and 

Linearized 

Maximum 

Likelihood 

Solutions 

Sample 

R 

M.L. R 

a R 

S R 

Lin . R 

1 

1.0 

1.130 

0.136 

+0 . 96 

1.134 

2 

1.0 

1.295 

0.136 

+2.18 

1.286 

3 

1.0 

. 982 

0.136 

-0.13 

. 977 

4 

1.0 

1.106 

0.136 

+ 0.78 

1.110 

5 

1.0 

.921 

0.136 

-0.58 

. 935 

6 

1.0 

. 936 

0.136 

-0.47 

. 934 

7 

1.0 

.821 

0.136 

-1.32 

.810 

8 

1.0 

. 807 

0.136 

-1.42 

. 810 

9 

1.0 

.908 

0.136 

-0.72 

. 939 

10 

1.0 

1.043 

0.136 

+ 0.32 

1.038 

Average 

1.0 

.995 

0.136 

-0.04 

. 998 


Theoretical Mean Squared Estimation Error 
Linearized Iterative 

.0182 -.0036 

-.0036 .0096. 

Actual Mean Squared Estimation Error 
Linearized Iterative 


.0197 - . 0076* 1 


.0205 -.0084 

1 

.-.0076 . 0118 j 


I.- . 00 84 . 0140. 


.0184 -.0039 

-.0039 .101 


Table 6.3 


(Continued) Monte Carlo Run 3 


A 


Sample 

Q 

M.L. Q 


e 8 

Lin. Q 

1 

.50 

.421 

.101 

- 0.79 

.418 

2 

.50 

.357 

.101 

- 1.43 

.358 

3 

.50 

.534 

.101 

+ 0.34 

.537 

4 

.50 

.495 

.101 

- 0.05 

.490 

5 

.50 

.392 

.101 

- 1.08 

.387 

6 

.50 

.556 

.101 

+ 0.56 

.549 

7 

.50 

.575 

.101 

+ 0.75 

.587 

8 

.50 

.465 

.101 

- 0.35 

.471 

9 

.50 

.712 

.101 

+ 2.12 

.672 

10 

.50 

.287 

.101 

- 2.13 

.304 

Average 

. 50 

.479 

.101 

- 0.21 

.478 
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Table 6.4 Monte Carlo Run 4: Maximum Likelihood 



and 

Linearized 

Maximum 

Likelihood 

Solutions 

Sample 

R 

M.L. R 

a R 

e R 

/N 

Lin . R 

1 

10.0 

10.812 

1.185 

+0.715 

10.858 

2 

10.0 

9.527 

1.185 

-0.291 

9.658 

3 

10.0 

9.872 

1.185 

-0.105 

9.876 

4 

10.0 

9.750 

1.185 

-0.236 

9.721 

5 

10.0 

11.624 

1.185 

+1.330 

11.573 

6 

10.0 

11.385 

1.185 

+1.210 

11.437 

7 

10.0 

12.195 

1.185 

+2.120 

12.514 

8 

10.0 

11.748 

1.185 

+1.510 

11.781 

9 

10.0 

7.757 

1.185 

-0.985 

7.831 

10 

10.0 

9.761 

1.185 

-0.905 

9.893 

Average 

10.0 

10.443 

1.185 

+0.374 

10.514 


Theoretical Mean Squared Estimation Error 


Linearized 


1.405 

-.059 

.059 

.058 


Iterative 


1.503 

-.050 

. 050 

.011 


Actual Mean Squared Estimation Error 
Linearized Iterative 


1.970 

-.321 

.321 

.106 


1.848 

-.296 

.296 

. 107 
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Table 6.4 (Continued) Monte Carlo Run 4 


A 


Sample 

Q 

M.L. Q 

1 

1.0 

. 820 

2 

1.0 

.641 

3 

1.0 

.975 

4 

1.0 

1.341 

5 

1.0 

.715 

6 

1.0 

.785 

7 

1.0 

.721 

8 

o 

• 

1 — ) 

. 648 

9 

1.0 

1.461 

10 

1.0 

.494 

Average 

1.0 

.860 


A 


Q 
O > 

A 

e Q 

Lin. Q 

.238 

-0.757 

. 814 

.238 

-1.510 

.637 

.238 

-0.105 

.976 

.238 

+1.430 

1.329 

.238 

-1.200 

.737 

.238 

-0.905 

.776 

.238 

-1.170 

.645 

.238 

-1.480 

.654 

.238 

+1.930 

1.413 

.238 

-2.120 

. 502 

.238 

-0.590 

. 848 
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In Runs 1-4, the measurement and driving noise covariance 
matrices are scalars. In Run 5, both R and Q are 2x2 
matrices, so that four noise covariance parameters are to be 
estimated, the diagonal elements of R and Q. Figures 6.9 and 
6.10 and Table 6.5 show the results of a ten sample Monte 
Carlo simulation. 

As before, the theoretical and actual mean and mean 
squared estimation error matrices were computed. Now the 
mean squared estimation errors (both theoretical and actual) 
are 4x4 matrices, with elements 


(AR o ) 2 

(AR.AR ) 
1 o 


(AO AR ) 
v O o 


(AQ 1 AR o T 


(AR AR, ) 
o 1 


(ar x ) 2 

(AQ q AR 1 ) 


(AQ 1 AR 1 ) 


(AR AQ ) 
o o 

(AR.AQ ) 

-L O 


( aq o )2 

(AQ 1 AQ o ) 


(AR o AQ 1 ) 

(AR 1 AQ 1 ) 

Taq o aq 1 ) 


(aq x ) 2 


where AR and AQ represent the R or Q estimation error and the 
bar over these quantities indicates either the theoretical 
or actual mean, depending upon which matrix is given. 

As can be seen, increasing the number of quantities to 
be estimated did not degrade the performance of the estima- 
tor. Of course, the number of computations needed to estimate 
four covariance parameters is considerably greater than that 
needed to estimate two covariance parameters. 
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Table 6.5 Monte Carlo Run 5: Maximum Likelihood 


and Linearized Maximum Likelihood Solutions 


A 


Sample 

R 

o 

M . L . R 

i 

1 

13.857 

11.806 

o 

Z. 

2.137 

1.887 

3 

0.483 

0.313 

4 

14.717 

15.512 

5 

4.553 

4.677 

6 

44.597 

47.657 

7 

2.692 

2.778 

8 

1.963 

2.865 

9 

2.698 

2.281 

10 

3.972 

4.106 

Average 

9 . 167 

9.358 


°R 

K o 

0 

< 

CD 

Lin . R 

t 

1.578 

-1.300 

11.836 

0.598 

-0.416 

2.373 

0.452 

-0.376 

5.000 

2 . 013 

+0.396 

15.047 

1.030 

+0.120 

4.243 

4.610 

+0.665 

46 . 803 

0.546 

+0.157 

2 .917 

0.636 

+1.410 

2.407 

0.724 

-0.575 

2.475 

0.467 

+0.287 

4.089 

1.265 

+ 0 .037 

9 . 772 


Theoretical Mean Square Estimation Error Matrix 


3.086 

-0.166 

-0.383 

0.037 

-0.166 

3.824 

-0.692 

0 . 047 

-0.383 

-0.692 

6.302 

-0.621 

0.037 

0.047 

-0.621 

0.325 


E <V 

II 

f— ' 
o 

E[(R 0 

- V J 

= 100 

E <V 

= 10, 

E[ (R 1 

- r-l) 2 ] 

= 100 

E(Q o > 

*. 

o 

H 

II 

E[(0 o 

- Qo )2 J 

= 100 

E (Q 1 ) 

= l. 

E[ (Qj 

- Q ± ) 2 ] 

= 1 
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Table 6.5 (Continued) Monte Carlo Run 5 


Sample 

R 1 

M.L. R 

1 

2.937 

3.115 

2 

8.702 

9.063 

3 

8.325 

8.690 

4 

31.170 

35.171 

5 

44.397 

40.871 

6 

5.500 

4.835 

7 

0.953 

0.654 

8 

2 . 823 

2.636 

9 

7.598 

8 .785 

10 

1.473 

1.146 

Average 

11.388 

11.450 


Actual Mean Squared 

1.551 -0.043 

-0.043 3.070 

1.101 -1.626 
-0.055 0.165 


A 


R 1 

e Rl 

Lin. R 

0.752 

+0.237 

3.892 

1.057 

+0.340 

8.619 

0.965 

+0.400 

9.639 

3.400 

+1.180 

34.572 

4.420 

-0.800 

40.560 

1.600 

-0.415 

6.464 

0.476 

-0.630 

1.349 

0.670 

-0.280 

1.924 

1.032 

+1.150 

8.317 

0.272 

-1.200 

1.146 

1.464 

-0.002 

11.648 

Estimation Error 


1.101 

-0.055 


-1.626 

0.165 


4.305 

-0 .289 


-0.289 

0 . 618 
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Table 6.5 (Continued) Monte Carlo Run 5 


A 


Sample 

Q o 

M.L. Q 

1 

4.550 

2.953 

2 

7.568 

8.962 

3 

13.596 

13.824 

4 

1.551 

0.735 

5 

0o 621 

2.249 

6 

14.583 

15.668 

7 

25.451 

26.445 

8 

29.335 

34.589 

9 

13.677 

11.912 

10 

1.052 

1.424 

Average 

11.198 

11.876 


/S 


o 

A 

e Q 

o 

Lin . Q 

1 

2.135 

-0.750 

5.000 

1.732 

+0.803 

8.707 

2.480 

+0.311 

17.180 

2.450 

-0.333 

0.306 

2.410 

+0.675 

4.077 

3.660 

+0.295 

13.408 

2.980 

+0.334 

24.907 

3.250 

+1.640 

33.971 

2.040 

-0.864 

11.433 

0.610 

+0.610 

1.398 

2.375 

+0.272 

12.039 
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Table 6.5 (Continued) Monte Carlo Run 5 


Sample 

Q 1 

M.L. Q 1 

°Sx 

(D 
O > 
H 

A 

Lin . Q 

1 

4.255 

4.183 

0.885 

-0.083 

5.641 

2 

1.568 

1.488 

0.473 

-0.169 

1.423 

3 

5.173 

7.393 

1.043 

+2.100 

6.742 

4 

1.355 

1.050 

0.411 

-0.742 

1.032 

5 

3.357 

2.482 

0.762 

-1.150 

2.223 

6 

1.218 

1.523 

0.451 

+0.744 

1.502 

7 

0.662 

0.421 

0.316 

-0.764 

0.196 

8 

0.180 

0.021 

0.123 

-1.280 

0.564 

9 

0.328 

0.433 

0.164 

+0.641 

0.656 

10 

0.864 

0.741 

0.228 

-0.540 

0.751 

Average 

1.896 

1.978 

0.487 

-0.134 

2.073 


1 
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A further check was made concerning the hypothesis that 
the R and Q estimation errors are zero mean normally distri- 
buted random variables with conditional covariance W n (R,Q). 
Under this hypothesis, the normalized errors e R and e^ are 
zero mean unit variance and normally distributed. Define 


K 



j = l 


where ej* is the normalized R estimation error on a given 

trial and K is the number of trials. Similar expressions 

are defined for the Q estimation errors. From Chapter 5, 

under the above hypothesis, e„ is a zero mean normally distri- 

2 

buted variable with variance (1/K) and K s R is a chi squared 
distributed variable with K-l degrees of freedom, with mean 

_ _ 2 2 

(K-l ) and variance 2K. e R , e Q , s R , and s Q were computed for 
each of the ten sample Monte Carlo trials previously presented. 
In most cases, their computed values were within one standard 
deviation of their expected values under the above hypothesis. 
Therefore, the variations of the computed quantities about 
their means were within that which would be expected due to 
the relatively small sample size and the above hypothesis can 
be reasonably accepted as a valid hypothesis. 

Near Maximum Likelihood 

In Section 4.3 a near maximum likelihood solution for 
estimating the state and noise covariance parameters was 
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given. In this solution certain terms in the likelihood 
equations and the conditional information matrix were omitted 
and the solution of the resulting "pseudo likelihood equations" 
sought. 

The solution of these equations was attempted using the 
iterative procedure of Chapter 3 , whereby the conditional 
information matrix was used as the negative gradient of the 
likelihood equations. Serious difficulty was encountered in 
implementing this solution. The information matrix given by 
(4.3.2) was nearly singular for the system and measurement 
schedule under study, resulting in an unstable iterative 
procedure. A different technique was then used to attempt 
to find a solution point of the pseudo likelihood equations. 
Essentially, the procedure was to evaluate the score as a 
function of the a priori values of R and Q for a given value 

A A 

of the true R and Q. The solutions for R and 0 were the 

n _ n 

/s 

values of R q and Q q which produced the smallest magnitude of 

A A 

the score. A sufficient number of values of R and Q were 

o o 

chosen to reasonably ensure that R n and Q n produced the 
smallest or near the smallest magnitude of the score. 

It was found that the solution point agreed quite well 
with the solution point of the full likelihood equations just 
given. In other words, the solution of the pseudo likelihood 
equations is a good estimate of the noise covariance parameters 
but a different technique of solution must be used when the 
information matrix associated with the pseudo likelihood 
equations is nearly singular. So the computational simplification 
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obtained by omitting certain terms in the likelihood function 
and information matrix is offset by the need for a more com- 
plicated algorithm for finding the solution to the likelihood 
equations. 

Whenever the information matrix is singular or nearly 
singular, there is a real question as to whether there is a 
unique solution of the likelihood equations. In the case 
mentioned above, instead of a single point where the magnitude 
of the score is minimized, there may exist a line in R q and 

/N 

Q o space along which the magnitude of the score is small and 
remains essentially constant. In such situations it is 
impossible to distinguish between errors in the estimates 
of R and errors in the estimates of Q. 

From the limited simulation of the near maximum likeli- 
hood solution it is felt that for the system and measurement 
schedule used, a unique solution of the pseudo likelihood 
equations does exist. However, finding the solution point 
requires considerable trial and error. Because of this 
complication, no series of runs was conducted in which the 
pseudo likelihood equations were solved. From the few trials 
that were conducted, it is felt that the solutions are quite 
close to the solutions of the full likelihood equations. 

Explicit Suboptimal 

The explicit suboptimal solution of Section 4.4 was 
simulated so that it could be compared with the maximum 
likelihood solution. A series of runs was made that corre- 
sponds to the series made of the maximum likelihood solution. 
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Identical random numbers were used in the simulation of the 
noises so that a meaningful comparison could be made. 

In Section 4.4, expressions for the theoretical condi- 
tional and unconditional mean and mean squared R and Q 
estimation error were developed. The conditional mean of the 
R estimation error was given by 


e (R 33 - R 33 ) = F 
n n 


D D 


and the conditional mean of the Q estimation was given by 


e (Q 3 3 - Q 33 ) = M 33 
n n 


where F and M are defined in Section 4.4. The conditional 
n n 

mean squared R and Q estimation errors were given by 


£ [ (R 33 
n 

- R 3j ) 2 ] 

= e [ (R 33 
n 

- e (R 33 ) ) 
n 

2 i + 

[e (R 33 

- R jj >] 2 



= G 33 + 
n 

(F 33 ) 2 
n 




e [ (Q 33 
n 

- Q jj > 2 ] 

*n 
*n £ 
O 

u 

II 

- e(Q^)) 

2 i + 

[e(Q 33 

- Q 3 j ) ] 2 



= j 33 + 

n 

n 




G and 
n 

J are 
n 

defined in Section 

4.4. 

Note 

that J 

n 


given here is not the conditional information matrix of the 

maximum likelihood solution. F and M represent the bias 

n n 
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of the estimators and and J n represent the variance of 
the estimators about the biased values. 

One of the purposes of this simulation is to check the 
validity of the above expressions. To do this, the follow- 
ing variables are defined. 


2 

o ^ 

R 


Q 


j j 
j j 


e 




e 

Q 


j j 


e[(R^ j - s(R^ j )) 2 ] 
n n 

e[(Qj j - MQj j )) 2 ] 

A > * A < • 

r” - e (R^- 1 ) 

O t • 

R^ 

- e(Q^) 
n n 


i i i "i 

If the expressions for the conditional means of R^ J and Q" 

are accurate, the normalized errors e . . and e . . should be 

q JJ 

zero mean unit variance random variables. 

In maximum likelihood estimation, the unconditional 
mean squared error is usually a nonanalytic function. 
However, in the case of explicit suboptimal estimation, an 
analytic expression for the unconditional mean squared error 
was found. Because of the relatively small sample size, 
this expression is not used. The theoretical mean squared 
estimation error for R and Q that is shown in the tables is 
defined as the average conditional mean squared estimation 
error, averaged over the ensemble of values of R and Q 
actually encountered. This is the same definition of the 
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theoretical mean squared estimation error that was used in 
the evaluation of the maximum likelihood estimator. When R 
and Q are scalars, the theoretical mean squared estimation 
error is defined by 


1 

K 


I 


j = l 


e[ (Rj 


Rj) 2 ] 


a th 

where here R. is the estimate of R. on the j trial and R. 

3 3 3 

is the true value of R on that trial. A similar expression 
is used for the theoretical mean squared Q estimation error. 
The actual mean squared R estimation error is defined by 


K 



j = l 


with a similar expression for the actual mean squared Q 
estimation error. 

In runs 6 and 7 the values of R and Q were selected 
from a Gamma distribution with the same population charac- 
teristics as runs 1 and 2 respectively. The a priori 
estimates of R and Q were chosen to be the theoretical means 
of the appropriate Gamma distribution 

Several things can be seen from an examination of 
Figures 6.11 and 6.12 and Tables 6.6 and 6.7. First, there 

A /\ 

is good agreement between the actual values of R and Q and 
their conditional means . There is also good agreement between 
the theoretical and actual mean squared estimation error as 
defined above. This tends to substantiate the validity of the 
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expressions developed in Chapter 4. 

The second thing to notice is that the estimates of R 

/\ /\ 

and Q are biased towards the a priori values of and Q . 

The estimates are to a large degree independent of the actual 
values of R and Q on any given trial. Unlike the maximum 
likelihood estimator, the explicit suboptimal estimator 
remains biased even when the number of measurements becomes 
large. This will become even clearer later when the condi- 

A A A 

tional means of R and Q are computed as functions of R q and 

A 

Q q for fixed true values of R and Q. 

In runs 8 and 9, the true values of R and Q were held 
fixed on each sample at the means of their respective distri- 
butions. As can be seen from Figures 6.13 and 6.14, when 
the a priori values of R and Q are equal to the true values 

/V A 

of R and Q, the estimates R and Q are quite closely grouped 
about the true values. Runs 10 and 11 are repeats of runs 6 
and 9, except that the a priori values of R and Q were not 
equal to the means of the respective distributions of R and 

A /S 

Q. For run 10, R q = 2.0, Q q = 1.0 whereas R= 1.0, Q = 0.5, 

A A __ 

and for run 11, R q = 20, Q q = 2 whereas R = 10, Q = 1. Again 
it can be seen that the estimators for R and Q are biased if 

A /S. 

the a priori values of R q and Q q are not equal to the means 
of their respective distributions, exactly as predicted. 
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Table 6.6 Monte Carlo Run 6: Explicit Suboptimal 


Solution 


Sample 

R 

R 

e ( R ) 

°R 

S R 

1 

0.860 

0.993 

0.965 

0.0252 

+ 1.110 

2 

1.599 

1.099 

1.115 

0.0403 

- 0.398 

3 

0.522 

0.879 

0.878 

0.0164 

+ 0.061 

4 

1.288 

1.060 

1.027 

0.0315 

+ 1.050 

5 

0.526 

1.054 

1.006 

0.0291 

+ 1.650 

6 

0.102 

0.797 

0.798 

0.0084 

- 0.120 

7 

0.304 

0 .834 

0.837 

0.0123 

- 0.244 

8 

1.718 

1.029 

1.067 

0 . 0355 

- 1.070 

9 

0.160 

0.970 

1.031 

0.0317 

- 1.920 

10 

0.484 

1.068 

1.046 

0.0332 

+ 0.665 

Average 

0.756 

0.978 

0.977 

0.0264 

+ 0 . 079 

Theoretical 

Mean Squared 

R Estimation Error: 

0.294 

Actual 

Mean 

Squared R Estimation 

Error : 

0.296 
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Table 6.6 (Continued) Monte Carlo Run 6 


Sample 


1 

0.456 

0.490 

2 

0.512 

0.654 

3 

0.330 

0.304 

4 

0.360 

0.602 

5 

0.997 

0.587 

6 

0.327 

0.172 

7 

0.329 

0.235 

8 

0.145 

0.543 

9 

1.487 

0.456 

10 

1.249 

0.610 

Average 

0.619 

0.465 


/s 


E (Q) 


CD 

to > 

0.444 

0.0405 

+1.130 

0.686 

0.0648 

-0.495 

0.302 

0.0263 

+0.076 

0.544 

0.0504 

+1.150 

0.511 

0.0472 

+1.610 

0.173 

0.0134 

-0.075 

0.236 

0.0196 

-0.051 

0.607 

0.0568 

-1.130 

0.553 

0.0514 

-1.880 

0.577 

0.0538 

+0.615 

0.463 

0.0424 

+0.095 


Theoretical Mean Squared Q Estimation Error: 0.189 

Actual Mean Squared Q Estimation Error: 0.191 
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Table 6 . 7 Monte Carlo Run 7 


Explicit Suboptimal 


Solution 




Sample 

R 

R 

1 

8.231 

11.362 

2 

8.104 

9.237 

3 

10.349 

8.452 

4 

5.362 

7.442 

5 

14.526 

14.625 

6 

11.753 

10.269 

7 

37.523 

16.213 

8 

29.982 

16.658 

9 

10.499 

12.729 

10 

19.663 

12.767 

Average 

15.599 

11.975 


/"s 


£ (R) 

°R 


12.280 

0.674 

-1.360 

9.997 

0.447 

-1.700 

9.238 

0.373 

-2.105 

7.419 

0.190 

+0.121 

13.005 

0.748 

+2.160 

10.534 

0.510 

-0.520 

19.195 

1.380 

-2.160 

16.856 

1.140 

-0.173 

11.493 

0.596 

+2.080 

12.637 

0.716 

+0.182 

12.166 

0.671 

-0.348 


Theoretical Mean Squared R Estimation Error: 59.344 

Actual Mean Squared R Estimation Error: 70.540 
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Table 6.7 (Continued) Monte Carlo Run 7 


Sample 

Q 

o > 

e(Q) 

Q 

o > 

CD 
O > 

1 

3.985 

1.285 

1.452 

0.576 

-0.290 

2 

1.677 

0.844 

1.002 

0.087 

-0.182 

3 

0.089 

0.707 

0.849 

0.072 

-0.198 

4 

0.002 

0.503 

0.496 

0.036 

+0.193 

5 

2.476 

1.895 

1.588 

0.146 

+2.100 

6 

0.923 

1.064 

1.104 

0.097 

-0.410 

7 

0.614 

2.183 

2.783 

0.266 

-2.250 

8 

0.906 

2 . 321 

2.330 

0.221 

-0.046 

9 

2.361 

1.556 

1.294 

0.117 

+2.240 

10 

0.255 

1.509 

1.509 

0.138 

+0.000 

Average 

1.329 

1.387 

1.441 

0.176 

+0.116 

Theoretical 

Mean Squared 

Q Estimation Error: 

1.817 

Actual 

Mean 

Squared Q Estimation 

Error : 

1.566 
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Table 6.8 Monte Carlo Run 8: Explicit Suboptimal 

Solution 




Sample 

R 

R 

e (R) 

CT R 


1 

1.0 

1.016 

1.0 

0.0286 

+0.560 

2 

1.0 

1.034 

H 

• 

O 

0.0286 

+1.190 

3 

1.0 

1.009 

1.0 

0.0286 

+0.315 

4 

1.0 

1.026 

H 

• 

O 

0.0286 

+0.910 

5 

1. o 

0.968 

o 

• 

H 

0.0286 

-1.120 

6 

1.0 

1.005 

1.0 

0.0286 

+0.175 

7 

o 

• 

H 

0.988 

1.0 

0.0286 

-0.420 

8 

1.0 

0.970 

O 

• 

i— 1 

0.0286 

-1.050 

9 

1.0 

1.046 

1.0 

0.0286 

+1.610 

10 

1.0 

0.977 

1.0 

0.0286 

-0.805 

Average 

O 

• 

H 

1.004 

1.0 

0.0286 

+0.137 

Theoretical 

Mean Squared 

R Estimation Error: 

. 000821 

Actual 

Mean 

Squared R Estimation 

Error : 

.000694 
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Table 6.8 (Continued) Monte Carlo Run 8 


Sample 

Q 

A 

Q 

e (Q) 

Q 

o > 

1 

0.5 

0.529 

0.5 

0.046 

2 

0.5 

0.559 

0.5 

0.046 

3 

0.5 

0.520 

0.5 

0.046 

4 

0.5 

0.546 

0.5 

0.046 

5 

0.5 

0.454 

0.5 

0.046 

6 

0.5 

0.513 

0.5 

0.046 

7 

0.5 

0.483 

0.5 

0.046 

8 

0.5 

0.442 

0.5 

0.046 

9 

0 . 5 

0.555 

0.5 

0.046 

10 

0.5 

0.458 

0.5 

0.046 

Average 

0.5 

0.506 

0.5 

0.046 


Theoretical Mean Squared Q Estimation Error: 
Actual Mean Squared Q Estimation Error: 
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+0.630 
+1.280 
+0.435 
+ 1.000 
- 1.000 
+0.284 
-0.370 
-1.260 
+ 1. 200 
-0,905 
+ 0.129 

.002126 

. 001755 


I 


Table 6.9 Monte Carlo Run 9: Explicit Suboptimal 


Solution 




Sample 

R 

R 

£ (R) 

Q 

W > 

1 

10.0 

10.218 

10.0 

0.447 

2 

10.0 

9.693 

10.0 

0.447 

3 

10.0 

10.013 

10.0 

0.447 

4 

10.0 

10.450 

10.0 

0.447 

5 

10.0 

10.355 

10.0 

0.447 

6 

10.0 

10.392 

10.0 

0.447 

7 

10.0 

10.600 

10.0 

0.447 

8 

10.0 

10.423 

10.0 

0.447 

9 

10.0 

9.850 

10.0 

0.447 

10 

10.0 

9.542 

10.0 

0.447 

Average 

10.0 

10.154 

10.0 

0.447 


Theoretical Mean Squared R Estimation Error: 
Actual Mean Squared R Estimation Error: 
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+0.488 

-0.687 

+0.029 

+ 1.010 

+0.795 

+0.875 

+1.340 

+0.950 

-0.335 

- 1.020 

+0.365 

.201 

.139 
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Table 6.9 (Continued) Monte Carlo Run 9 


A 


Sample 

Q 

Q 

1 

1.0 

1.050 

2 

1.0 

0.935 

3 

H 

• 

O 

1.013 

4 

1.0 

1.085 

5 

1.0 

1.071 

6 

1.0 

1.069 

7 

1.0 

1.132 

8 

1.0 

1.076 

9 

1.0 

0.955 

10 

1.0 

0.902 

Average 

1.0 

1.029 

Theoretical 

Mean Squared 

Actual 

Mean 

Squared Q Es 


A 


e(Q) 

Q 
O > 

CD 

o > 

1.0 

0.087 

+0.575 

1.0 

0.087 

-0.745 

1.0 

0.087 

+ 0.150 

H 

• 

O 

0.087 

+0.975 

1.0 

0.087 

+0.815 

1.0 

0.087 

+0.795 

1.0 

0.087 

+1.520 

1.0 

0.087 

+0.875 

1.0 

0.087 

-0.517 

1.0 

0.087 

-1.120 

1.0 

0.087 

+0.333 


Q Estimation Error: .00757 

timation Error: .00588 
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Table 6.10 Monte Carlo Run 10: Explicit Suboptimal 

Solution 


/\ /\ 


Sample 

R 

R 

e (R) 

°R 

e R 

1 

0.859 

1.708 

1.680 

0.0257 

+1.090 

2 

1.599 

1.819 

1.831 

0.0409 

-0.294 

3 

0.533 

1.593 

1.592 

0.0169 

+0.059 

4 

1.288 

1.775 

1.743 

0.0321 

+1.000 

5 

0.526 

1.770 

1.721 

0.0295 

+1.660 

6 

0.102 

1.511 

1.512 

0.0089 

-0.112 

7 

0.304 

1.547 

1.551 

0.0128 

-0.031 

8 

1.718 

1.746 

1.783 

0.0364 

-1.020 

9 

0.160 

1.685 

1.745 

0.0319 

-1.880 

10 

0.484 

1.785 

1.761 

0.0335 

+0.715 

Average 

0.756 

1.694 

1.692 

0.0269 

+0.119 

Theoretical 

Mean Squared 

R Estimation Error: 

1.1203 

Actual 

Mean 

Squared R Estimation 

Error : 

1.1249 
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Table 6.10 (Continued) 


Monte Carlo Run 10 




Sample 

Q 

Q 

e(Q) 


1 

0.456 

0.531 

0.485 

0.0404 

2 

0.512 

0.695 

0.726 

0.0648 

3 

0.330 

0.346 

0.343 

0.0262 

4 

0.360 

0.643 

0.585 

0.0505 

5 

0.997 

0.628 

0.552 

0.0472 

6 

0.327 

0.214 

0.215 

0.0134 

7 

0.329 

0.276 

0.277 

0.0196 

8 

0.145 

0.585 

0.648 

0.0568 

9 

1.487 

0.498 

0.594 

0.0514 

10 

1.249 

0.652 

0.618 

0.0538 

Average 

0.619 

0.507 

0.504 

0.0427 


Theoretical Mean Squared Q Estimation Error: 
Actual Mean Squared Q Estimation Error: 
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+1.140 
-0.480 
+0.115 
+1.150 
+1.610 
-0.075 
-0.051 
- 1.110 
-1.860 
+ 0.633 
+ 0.107 

. 1778 

.1799 



Table 6 oil Monte Carlo Run 11: Explicit Suboptimal 

Solution 




Sample 

R 

R 

£ (R) 

°R 

< 

<u 

1 

10.0 

15.697 

15.543 

0.087 

+0.378 

2 

10.0 

16.060 

15.543 

0.087 

+1.010 

3 

10.0 

15.729 

15.543 

0.087 

+0.621 

4 

10.0 

16.027 

15.543 

0.087 

+1.340 

5 

10.0 

15.363 

15.543 

0.087 

-0.183 

6 

10.0 

15.449 

15.543 

0.087 

-0.046 

7 

10.0 

15.150 

15.543 

0.087 

-0.820 

8 

10.0 

15.099 

15.543 

0.087 

-0.172 

9 

10.0 

16.248 

15.543 

0.087 

+1.030 

10 

10.0 

15.032 

15.543 

0.087 

-1.150 

Average 

10.0 

15.585 

15.543 

0.087 

+0.201 

Theoretical 

Mean Squared 

R Estimation Error: 

30.931 

Actual 

Mean 

Squared R Estimation 

Error : 

31.366 
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Table 6.11 (Continued) Monte Carlo Run 11 


Sample 

Q 

Q 

e(Q) 



1 

1.0 

1.167 

1.1342 

0.450 

+0.342 

2 

O 

• 

H 

1.222 

1.1342 

0.450 

+1.150 

3 

O 

• 

H 

1.188 

1.1342 

0.450 

+0.414 

4 

1.0 

1.241 

1.1342 

0.450 

+1.070 

5 

1.0 

1.118 

1.1342 

0.450 

-0.400 

6 

O 

• 

iH 

1.130 

1.1342 

0.450 

-0.206 

7 

1.0 

1.062 

1.1342 

0.450 

-0.870 

8 

1.0 

1.019 

1.1342 

0.450 

-0.986 

9 

1.0 

1.224 

1.1342 

0.450 

+1.560 

10 

1.0 

1.034 

1.1342 

0.450 

-1.135 

Average 

1.0 

1.141 

1.1342 

0.450 

+0.120 


Theoretical Mean Squared Q Estimation Error: .0256 

Actual Mean Squared Q Estimation Error: .0257 
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The normalized differences between the R and Q estimates 
and their theoretical conditional means were studied using 
a procedure similar to that used in testing the normalized 
estimation errors of the maximum likelihood estimator. For 
each run presented, the mean and variance of these normalized 
differences across the ensemble of ten trials were computed. 

In most cases, the computed mean and variance of the differ- 
ences were within one standard deviation of their expected 
values. From this it can reasonably be concluded that the 
theoretical expressions for the conditional mean and condi- 
tional variance of the estimate about the conditional mean 
are valid. 

From the Monte Carlo runs presented, it can be seen that 
the theoretical results related to the maximum likelihood 
solution and the explicit suboptimal solution agree reasonably 
well with the actual results of the simulations. These 
theoretical results predict the ensemble averages of the esti- 
mation error and mean squared error 0 Therefore, to study 
the behavior of the various estimators, Monte Carlo simula- 
tions are not necessary. The following are the results of a 
statistical evaluation of the maximum likelihood and explicit 
suboptimal solutions. 

It has been shown that the maximum likelihood estimator 
of the noise covariance parameters is unbiased for any 
values of R and Q that can be encountered. The conditional 
covariance of the estimates about the true values of R and 
Q was shown to be 
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cov (R, Q|R, Q) = W n (R,Q) 

The conditional average of the estimation error is zero and 
is independent of the actual values of R and Q whereas the 
covariance of the estimation error is a strong function of 
R and Q. 

Figure 6.17 shows the normalized variance of the R and 
Q estimator as a function of R for a fixed Q. Figure 6.18 
shows the normalized variance as a function of Q for a fixed 
R. In both examples, the system and measurement schedule is 
that given previously. 

Unlike the maximum likelihood estimator, the conditional 
average of the estimation error for the explicit suboptimal 
estimator is a strong function of the true and a priori values 
of R and Q. Figure 6.19 shows the variation of the condi- 

A A 

tional average of R^ and Q n as a function of "the a priori 

A. A 

estimate R , for a fixed R = 1, Q = 0.5, and Q = 0.5. It 
o o 

can be seen that only when the a priori estimate of R is 
exactly equal to the true value of R are the conditional 

A A 

means of R and Q equal to the true values of R and Q. This 
means that if the a priori estimate of R is not equal to the 
true value of R, the explicit suboptimal estimators for R 
and Q are highly biased, with the amount of the bias obtained 
from this graph. 

Figure 6.20 shows the variation in these conditional 

A 

averages as a function of Q q , for a fixed R= 1, Q= 0.5, and 

A 

R = 1. The same general conclusions can be drawn from this 
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graph concerning the bias of the estimator when the a priori 
value of Q is not equal to the true value. 

The maximum likelihood estimator of the noise covariance 
parameters is slightly biased towards the a priori estimates 
of R and Q when the number of measurements is small. How- 
ever, it was shown that as the number of measurements becomes 
large, the effect of this initial condition bias becomes 
arbitrarily small. The same is not true for the explicit 
suboptimal estimator. If the estimator is biased towards 
the initial estimates, this bias does not necessarily decrease 
as the number of measurements increases. The explicit esti- 
mator is often unable to distinguish between an error in R 
and an error in Q and resultingly the estimates of R and Q 
may be biased no matter how many measurements are taken. 

As was mentioned in Chapter 4, just because the explicit 
suboptimal estimator is highly biased with respect to the 
a priori estimates of R and Q does not mean that no useful 
information can be obtained from them. In fact, the very 
fact that they are so highly biased if the a priori estimates 
are incorrect can be the basis for estimating the true values 
of R and Q. As will be shown, the variance of the estimators 
about the possibly biased values is quite small so that if 
the estimates obtained from the explicit suboptimal estimator 
differ by an appreciable amount from the a priori estimates, 
there is good justification for concluding that the a priori 
estimates are in error. Unfortunately, the explicit subop- 
timal estimators do not provide any information concerning 
how to correct the a priori values to make this discrepency 
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smaller. In any actual situation, the a priori values of R 
and Q would have to be adjusted in a trial and error fashion 
to attempt to make the estimated values of R and Q equal to 

A 

the a priori values. The values of R and Q which make 

o o 

/N /N S\ /\ 

R n = R q and Q n = Q q to the desired degree of accuracy are 
then the best estimates for R and Q. 

In Section 4.4, expressions for the conditional mean 
squared estimation error of the estimators for R and Q were 
developed. As was mentioned, part of this error comes from 
possible bias in the estimator and part comes from possible 
variations about this bias. Figure 6.21 shows the variance 
of the estimator about the biased values as a function of the 


a priori estimates of R q and Q , for fixed values of R and Q. 
The bias can be found from the previous graphs 6.19 and 6.20. 


o^(q ) represents the variance of Q as a function of Q for 
y o o 

✓N 2 ~ A 

a fixed R q/ (R q ) represents the variance of Q as a function 

A A 2 ^ 

of R for a fixed Q . Similarly, (R ) represents the 
o o R o 

A A A 2 A 

variance of R as a function of R for a fixed Q and (Q ) 

o o R o 

/N /N 

represents the variance of R as a function of Q q for a fixed 
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6 . 5 Testing of Statistical Hypotheses 

in Chapter 5 various techniques for testing statistical 
hypotheses were described. In this section numerical results 
of a simulation of these hypothesis tests are described. 

As mentioned in Section 5.7, the measurement residual 

A 

z k - H ] c x ] c |] c _i is a good test variable upon which hypothesis 

tests can be conducted. If the values of R and Q that are 

/\ 

used to compute the state estimate x k| are equal to the 
true values, then the measurement residual is a zero mean 
normally distributed random variable with covariance 

cov (Az^ | R,Q) = R + | k _-^H k 

where 4^ = 

However, if the measurement and driving noises are not 
zero mean normal variables with known covariances, then the 
measurement residuals may not be zero mean with covariance 
as given above . 

Two hypothesis tests were devised to test hypotheses on 
the values of R and Q used to compute the measurement resi- 
dual gains and to test the hypothesis concerning the unbiased- 
ness of the measurement and driving noises. The first of 
these two tests will now be described. 

A 

Suppose R q and Q q are the a priori estimates of the 
measurement and driving noise covariance matrices and a 
maximum likelihood estimate of the state is computed as a 
function of the measurements and these a priori values of 
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R and Q . In Section 2.3, the recursive equations for 
o o 

computing the state estimate and its "computed" covariance 
matrix are given. It was shown that only under the null 

A /N 

hypothesis R q = R and Q q = Q does the computed covariance 

matrix accurately represent the covariance of the estimation 

error. It was also shown that the measurement residual has 

a zero mean even under departures from the null hypothesis, 

but only if R = R and Q = Q are the residuals at a time k 
o o 

independent of the residuals at a time j, for k / j. There- 
fore, with Az* = - H k x k|k . ir 

e (Az^) = 0 

e(Az k Az k ) = R + H k P k|k-l H k 


where the above conditional expected values are conditioned 

/N /\ 

upon the fact that R q and Q q are used to compute the weighting 
matrices for the measurements, whereas the true values are 
R and Q. P k|k-1 ; '' s the ,,true " state estimation error covar- 
iance and is not equal to the "computed" error covariance 
matrix except under the null hypothesis. 

Under the null hypothesis, 


* *T> 

£ (Az k Az j ) = 0 for k ^ j 
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Consider the variable 


n 


t = 
n 


Tit ' 1 

k=l 


<z k - Vk|k-1» 


where 


* ~ * T 

B k " R o - H k p k|k-l H k 


Then under the null hypothesis / t is a zero mean normal 
random variable with covariance 


n n 


cov (t n ) 


= e 


= k I e(Az k a z * t ) Vb*^ 7 

j=l k=l 




k=l 


= I 


Therefore, t is a zero mean normal variable with covariance 
I. Since each component of the vector t is statistically 
independent of any other component, an independent test of 
each component is possible. Using the procedures of Section 
5.4 concerning tests on the mean, a critical region (-t , t Q ) 
can be defined such that under the null hypothesis, the 
probability of the test variable t being in this region is 
1 - a, where a is the level of significance of the test. 
Using the procedures outlined in Section 5.4, the test 
variable t can be used to test the hypothesis that the 
residual is zero mean with covariance I. A failure in this 
test can be caused by a bias in the measurement or driving 
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* 

noises or incorrect values of R and Q used to compute 

* 

which is used to normalize the residuals Az^.. 

Now consider the test variable 


( 2 = V “\/b* 1 Az* Az* T ' 

^n v k ki c v k 


k=l 


Under the null hypothesis 


- 1 


* * T 

e (Az k Az k } v ~k 


VC" 


k=l 


= n I 


Since Az^, are normally distributed random variables, the 

2 

diagonal elements of x n can be shown to be independent chi- 

square distributed varialbes under the null hypothesis, 

with n-1 degrees of freedom. Using the procedures outlined 

in Section 5.5 concerning tests on the variance, a critical 

2 

region for each diagonal element of x n can be defined such 

that under the null hypothesis the probability that the test 

variable lies within this critical region is 1 - a, where a 

is the level of significance of the test. The test variable 

X n can then be used to test the hypothesis that the residuals 

are zero mean normally distributed random variables with 
* 

covariance B^. A failure of this test can be caused by a 

bias in the measurement or driving noises or incorrect values 

* 

of R and Q used to compute B^. 

Table 6.12 shows the results of such tests of hypotheses 
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on the values of R and Q. Shown are the true values of R 
and Q along with the a priori values of R and Q which were 
used to compute the proper test variables. The two columns, 
M-fail and S-fail, indicate whether or not the mean and 
variance test variables failed the appropriate test on the 
5, 10, and 20 percent levels. A "1" indicates a test 
failure and a "0" indicates a passing test. 

As can be seen, the mean test was not highly sensitive 
to departures from the null hypothesis on the values of R 
and Q. Only in extreme cases did the mean test fail, and 
then only on the 10 and 20 percent levels. 

However, as would be desired, the variance test was 
very sensitive to moderate departures from the null hypo- 
thesis, thus indicating a powerful test of the hypothesis. 

Another series of hypothesis tests was conducted to see 
if the above hypothesis tests could detect a bias in either 
the measurement or driving noises. In Chapter 2, it was 
shown that maximum likelihood state estimation can be adverse 
ly affected if it is assumed that the measurement and 
driving noises are zero mean, when in fact they are not zero 
mean. For this test, it was assumed that the measurement and 
driving noise covariance matrices were precisely known, but 
there was a bias in either of the two noises. In other words 
hypotheses on the means of the measurement and driving noises 
are being tested. The results of these tests are shown in 
Table 6.13. B is the actual measurement noise bias and 
is the hypothesized value of the measurement noise bias. B^ 

A 

is the true driving noise bias and B^ is the hypothesized 
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value of the driving noise bias. The system and measurement 
schedule are those given previously. 

The variances of the measurement and driving noises 
about any possible biases were 10 and 1 respectively. It 
would be expected that only when the biases are comparable 
to the standard deviation of the noises would the tests 
indicate a failure. This was indeed the case. As can be 
seen, the mean test was somewhat more powerful in detecting 
departures from the hypothesis about the noise biases, but 
because of the non- independence of the tests, the variance 
test also indicated failure if the difference between the 
true bias and the hypothesized bias was sufficiently large. 

These hypothesis test runs are not meant to be all 
inclusive but rather indicate that with only a moderate 
expenditure of computation, powerful tests on hypotheses 
concerning the unbiasedness and covariance of the measurement 
and driving noises can be implemented. The tests do not 
tell why the particular test failed, but they do indicate 
that one or more of the underlying assumptions about the 
system or measurements is in error. The tests might also be 
used to test hypotheses concerning the values of certain 
elements of the transition matrices, measurement matrices H k , 
or any other parameter which is used to describe the system. 
These runs are merely meant to test the feasibility of using 
hypothesis tests in real time to indicate a failure of certain 
assumptions about the environment under which the estimation 
process is taking place. 
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Chapter 7 
CONCLUSION 


7 . 1 Summary of Results 

The technique of maximum likelihood estimation has been 
shown to be effective in estimating the state and statistics 
of the measurement and driving noises in a linear dynamical 
system observed by linear noisy measurements. Theoretical 
and empirical results indicate that the estimator of the noise 
covariance parameters is asymptotically unique, unbiased, 
consistent, and efficient. However, the solution of the 
likelihood equations for the state and noise statistics 
generally requires considerably more computation than that 
normally involved in estimating the state of the system when 
the noise statistics are presumed known. For this reason 
the optimal procedures requiring an iterative solution of 
the likelihood equations will probably find their greatest 
application in data reduction rather than real time estimation 
problems . 

In many cases, a linearized solution of the likelihood 
equations is quite adequate and can be used if a real time 
solution of the estimation problem is required. Of the sub- 
optimal techniques studied, the linearized maximum likelihood 
solution is the only generally applicable technique that is 
effective for the real time estimation of the state and noise 
covariance parameters. The other techniques for estimating 
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the noise covariance parameters are either biased with 
respect to the initial estimates of the parameters or result 
in possibly non-unique solutions. 

Any technique for the estimation of the noise covariance 
parameters requires some additional computation. Therefore, 
before any estimation of these quantities is undertaken, 
there should be some indication that the a priori values are 
sufficiently in error to substantially reduce the effective- 
ness of the state estimation procedure. It has been shown 
that there are several techniques for testing certain hypo- 
theses concerning the values of the noise statistics which 
allow a decision to be made concerning the correctness of 
the a priori estimates of these parameters. 

7 . 2 Suggestions for Future Sturdy 

In Chapters 3 and 4 techniques for the estimation of 
noise covariance parameters were developed under the assump- 
tions that the measurement and driving noises were independent 
zero mean normally distributed random variables with diagonal, 
time invariant covariance matrices. These assumptions were 
made to simplify the estimation problem while not overly 
restricting the applicability of the solution. However, the 
techniques discussed can be extended to include cases when 
these assumptions are not valid. A similar structure of the 
problem must be retained so that definitive results can be 
obtained. That is, the dynamics of the state dre still 
described by a linear differential or difference equation 
with normally distributed driving noise and the measurements 
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are linear functions of the state with additive normally 
distributed measurement noise. In this section, the follow- 
ing cases will be briefly studied: 

1) non-diagonal noise covariance matrices 

2) time varying noise covariance matrices 

3) estimation of more general parameters, such as 
elements of the state transition matrix 

Possible extensions of the explicit suboptimal estimator 
will be discussed first. 

The extension of the explicit suboptimal estimator to 
the case of non-diagonal noise covariance matrices is straight- 
forward. In the expressions of Chapter 4 for estimators of 

A • • A • • 

the diagonal elements R^- 1 and Q^- 3 , all that need be done is 
change the indices to and with appropriate changes in 

the indices appearing on the right hand side of these equa- 
tions. The expressions for the conditional and unconditional 
means of the estimators can easily be modified to include 
this generalization. However, extension of the expressions 
for the conditional and unconditional mean squared error of 
the estimators would be exceedingly difficult. 

The case of time varying noise covariance parameters is 
considerably more difficult to treat. If it is assumed that 
R and Q vary slowly with time compared to the rate of data 
accumulation, then the total estimation time can be broken 
into segments and an independent estimate of R and Q obtained 
from the data gathered in each time segment. Alternatively, 
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a different weighting of the measurement data could be 
proposed such that data taken in the distant past is essen- 
tially not used in the estimate of the covariance parameters. 
A third procedure could be used to model the noise covariance 
parameters in a way described by Smith and outlined in 
Chapter 4. In that case, the noise covariance parameters are 
assumed to be of the form 


R = k R 
n 


nom 


n 


where R is some nominal value of the measurement noise 

nom n 

covariance at time n and k is an unknown time invariant 
precision factor associated with R^. A similar equation 
could be used for the driving noise covariance matrices. 

The estimation problem is then reduced to estimating certain 
constants associated with each unknown noise covariance 
parameter. Any more general time variation of the noise 
covariance parameters than those outlined above cannot be 
adequately treated using the explicit suboptimal estimator. 

There is no real possibility that the explicit estimator 
can be used to estimate more general parameters of the system 
or measurement. The estimation equations were derived with 
the particular goal of estimating the measurement and driving 
noise covariance matrices and accordingly cannot be modified 
to include the estimation of other system parameters. 

The technique of maximum likelihood offers a procedure 
and formalism within which any of the extensions mentioned 
above can be handled. The resulting equations may be so 
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complicated that a solution may not be practical, but at 
least the equations for an optimal solution of the problem 
can be derived. 

If the measurement and driving noise covariance matrices 
are not assumed to be diagonal, then additional likelihood 
equations must be derived for estimating these off diagonal 
elements. This can be done quite easily. In addition to the 
likelihood equations for the state and diagonal elements of 
R and Q, one additional equation is needed for each off diag- 
onal element of R and Q that is to be estimated. This 
equation has the form 


3L (R,Q,x 
n : 

, 3R jk ' 


z ) 

n 


r R+R 


sTl 


Q "9n 

x +X | 
n n n 


0 


with a similar equation for the off diagonal elements of Q. 

L n (R,Q,x n , Z n ) is the logarithm of the appropriate likelihood 
function as derived in Chapter 3. The choice of likelihood 
functions is determined by whether a priori information about 
the noise covariance parameters is to be utilized. 

As was the case of the explicit suboptimal estimator, 
the case of time varying noise covariance parameters is more 
difficult to treat. Again if it is assumed that the time 
variation is slow, then the total estimation time can be 
divided into segments and an independent estimate of the noise 
covariance parameters obtained assuming that R and Q are 
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essentially constant over this time segment. Of course, the 
time segment over which R and Q are constant must be long 
enough to allow sufficient information to be gathered to 
obtain a reasonably good estimate of R and Q. 

Alternatively, if the noise covariance parameters are 
assumed to change with time in a deterministic manner as 
proposed by Smith, the technique of maximum likelihood can 
easily be applied to estimate the value of the unknown 
precision factors k-' , with a separate precision factor 
associated with each unknown element of R and Q. In this 
case, the likelihood function can be thought to be a function 
of the parameters k-' rather than R^ and Q^. For each k^ , an 
equation of the following form must be solved. 


3 L (k 1 , . . ,k m ,x , Z ) 
n ' n' n 

3k^ 


= 0 


x ->-x | 

n A n | n 

k->-k 


where here it has been assumed that there are m such preci- 
sion factors. The solution of this equation can be obtained 
in a manner entirely analogous to the solution for the time 
invariant covariance parameters discussed in Chapter 3. 

There is also the likelihood equation associated with the 
state x n which must be solved simultaneously with the like- 
lihood equations for the parameters k^ „ 

Much more work needs to be done in the area of maximum 
likelihood estimation when the time variation of R and Q is 
more complicated than the cases given above. It is felt 
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that there is much promise of obtaining an optimal solution 
to the problem. Such a solution might proceed along the 
following lines. 

Let £ n represent the vector of diagonal elements of the 
noise covariance matrices at time n. 


5 


T 

n 


/r-L1 r yy nil 0 nn, 

n ' * * ' n ' Q n ' * * ' Q n ] 


Let be the vector of diagonal elements of these matrices 

at time n-1, and let the relationship between £ n and be 

given by 


S n = ¥(n,n-l)? n _ 1 + u n (7.2.1) 

where ^(njn-l) is the "noise covariance parameter transition 
matrix" and u^ is the "noise covariance parameter driving 
noise." Since £ n represents a vector of noise variances, 
every element of £ n must be positive. Therefore, the distri- 
bution of the noise u n must be chosen so that for any 4 / (n,n-l) 
and the elements of £ n are all positive. In general 

this would require that the distribution of u^ be a function 
of ’l'(n,n-l) and £ However, these problems can be 

avoided if it is assumed that the elements of u are chosen 

n 

from a distribution that is independent of 4 , (n,n-l) and £ n _^ 
and only allows positive values for the elements of u n . Such 
a distribution might be the Gamma distribution used in Chap- 
ter 3. Note that this choice allows £ n to decrease as well 
as increase from time n-1 to time n. If on the average £ n 
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then the parameters ¥(n,n-l) and 


is to be equal to 

the distribution of u should be chosen so that 

n 


(I - Y(n,n-1) K n = u n 


where u is the mean of the u distribution and E, is the 
n n n 

average value of If the noise u n on a given trial is 

less than its mean, then £• will be less than its mean value. 

It might also be reasonably assumed that '{'(njn-l) is 
a diagonal matrix so that if the elements of u n and are 

mutually independent, the elements of £> n will also be 
independent . 

If H'Cn^-l) is a zero itiatrix, then E is completely 

independent of £ n _^, whereas if u n is not present, is a 

deterministic function of 5 , . All other cases between 

n~l 

these two extremes can be handled by appropriate choice of 

V(n,n-1) and the parameters of the distribution of u . It 

can be shown that if E, , and u have a Gamma distribution 

n-i n 

and are mutually independent, then E n has a Gamma distribution. 

It is desired to estimate the values of E, and x given 

n n 

the measurements Z^. The appropriate likelihood function to 
maximize would be 


1(5 


n 


n ' n 


z„) = f (5 


n 


n 


Z ) 


n 


( 7 . 2 . 2 ) 


where f(5 n ,x n |Z n ) is the conditional probability density 

function of E and x given the measurements Z . The choice 
n n n 
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of the proper likelihood function is not as obvious in the 
case of time varying noise statistics as was the case when 
the statistics were assumed to be time invariant. Two other 
possibilities will be discussed subsequently. 

From Bayes' rule. 


f U 


n 


n ' n 


Z ) = 


f (£ /X , z , Z , ) 
n n' n n-1 

f (z ,Z , ) 
n n— l 


(7.2.3) 


f(z Z , ,x , £ ) f(£ ,x | Z .) 
n 1 n-1 n n n n 1 n-1 

HTJzTTJ 

n n-± 


f r t (z ,z ,x ,£ ) 

where fUjz^) =JJ fTz ~ Y ) ~ ' dg n dx n 

— co i 

oo 

= // f(z |Z -| , x ) f (x ,5 |Z ,) d? dx (7.2.4) 

JJ n 1 n-1' n n n n 1 n-1 n n 


Then the logarithm of the likelihood function is 


L (£ ,x ,Z ) = In f(£ ,x Z ) 
n ^n' n' n n' n 1 n 


In f(? , x |Z -) + In f ( z |Z . , x , £ ) (7.2.5) 

n' n 1 n-1 n 1 n-1 n f n 


- In f (z Z , ) 
n n-1 


The gradient of L n with respect to the parameters to be 


estimated is then 


9L 91n f (5 ,x I Z .) 91n f (z |z , ,x ) 

n _ ^n f n 1 n-1 n 1 n-1 n'^n 

9 a ~ 9a 9a 

n n n 


(7.2.6) 
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T T T 

where a = (x , £ ) . 

n n n 


The density function f (z I Z_ -,,x .£ ) is easily found. 

n n— inn 


f (z J Z n-l ,x n /C n ) “ 


(2tt ) 


i -i(z -R x ) T R 1 (z -II x ) 

1 2 n n n n n n n 

Y/2. d ,1/2 ® 


n ' 


(7.2.7) 


The real difficulty comes in finding the density function 

f (£ ,x I Z ,). It can be obtained, at least in theory, from 
^n' n 1 n-1 

previously obtained density functions. If it is assumed that 

initially £ and x are independent, then 
n n 


^o'V = 


f(x o ) 


f «o> 


where f(x Q ) is the a priori probability density function of 
the initial state and f (£ q ) is the a priori probability densi- 
ty of the initial value of £ , both of which are presumed to 
be known. Then before the first measurement. 


f(£ 1 ,x 1 ) = ffXjJ^) f(£ 1 ) 


Assuming f ( x q ) is a normal density function with mean X Q | 0 

and covariance P i , it is easy to show that 

o o 


f( x 1 l£ 1 ) = 


-7 (x l -x l|o )Tp I|o (x l" x l|o ) 


(2tt) 3/2 |P 


T72 


1 1 o 


(7.2.8) 


where 


✓\ 


: l| o 


$ (1 , 0 ) x q 


o 
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p l|o - P o|o * T < 1 '°> + r i Q 1 r I 

Assuming the model for C n previously given, f (^ 1 ) is a 
Gamma probability density function with known parameters. 

Then using (7.2.3) with (7.2.4), (7.2.7), and (7.2.8), the 

density function f(5^,x^|z^) can be found. In most cases, 
evaluation of this density function will be very complicated 
but it can be performed in theory. 

Once the necessary density functions in (7.2.5) are 
found, then the estimates of E and x n can be found by finding 
the zero points of the likelihood equations (7.2.6). Some 
iterative procedure will be needed for the solution of these 
equations . 

Assuming that a solution of the likelihood equations 
can be found, much work needs to be done to determine if such 
a solution is unique and if it is, what are its asymptotic 
properties. The situation is much more complicated than the 
case when the noise covariance matrices were assumed to be 
time invariant. If the noise covariance parameters change 
rapidly with time and are not sufficiently correlated with 
past values of the noise parameters, then there may not be 
sufficient information in the measurements to uniquely define 
the estimates. In such a situation, the maximum likelihood 
estimator may be required to estimate the value of the noise 
parameters essentially on the information contained in a 
single measurement. If the measurement is of small dimension 
compared with the number of parameters being estimated, there 
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may be insufficient information in the measurement to estimate 
the noise parameters. This is not a shortcoming of this 
particular type of estimation, but rather a fundamental problem 
of trying to estimate the value of a quantity with insuffi- 
cient information. A similar problem was encountered in 
Chapter 2 when the state of the system was being estimated 
without prior information. Until sufficient information was 
gathered, a unique state estimate could not be defined. 

Assuming that a unique solution to the problem exists, 
finding its asymptotic properties will be difficult. How- 
ever, it should not be expected that the estimator for £ n is 
a consistent estimator when there is "noise" driving the 
vector of noise covariance parameters. This is entirely 
analogous to the fact that a Kalman estimator for the state 
is not consistent when there is noise driving the state, 
or in other words , the covariance of the estimation error 
does not go to zero as the number of measurements goes to 
infinity. Therefore, it can be anticipated that the maximum 
likelihood estimator of the state which uses estimates of £ 

n 

to compute the appropriate filter gains will not converge to 
the estimates that would be obtained if the noise covariance 
parameters were known precisely. However, if the noise 
covariance estimator operates properly, this difference may 
be small. 

As was mentioned previously, the likelihood function 
given above is not the only possibility that might be 
■''^nsidered. Another solution to the problem might be found 
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by simultaneously estimating the state and the values of 
the noise covariances at all times up to and including time 
n. In such a situation, a likelihood function of the form 


l(£ lf ..,£ ,x ,Z ) = f (£.,..,£ ,x Z ) 
^1 ^n n n ^1' '^n n n 


(7.2.9) 


might be chosen. Define 


nt = (? 


n 


1 ' 




Then by Bayes ' rule 


f(ft , x I Z ) = f (x Ift ,Z ) f(ft |Z ) (7.2.10) 

n' n 1 n n 1 n' n n 1 n 

f(x n |^ n ,Z n ) is the probability density function of the state 

given the measurements Z and the values of the noise covari- 

n 

ance parameters at all times. From Chapter 2, 


f (x I ft , Z ) - 
n 1 n n 


1 ~ t —J 

(x -x | ) p | (x -x | ) 

2 n nn n n n nn 


(2,) B / 2 |P , |V2 

n n 1 


(7.2.11) 


where x i is a function of Z and ft , and Pi is a function 
n n n n n n 


of ft . 
n 


By application of Bayes' rule. 


f (ft Z ) = f U ,ft , z , Z ,) 
v n n n' n-1' n' n-1 


f (z , Z , ) 
n n-1 
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f (z rJ C n' Z n-l ) f(!i n' Z n-l ) 
f(z nl Z n-l 7 f(Z n-l> 


f < z n I V Z n-l» "^nlVl'Vl 1 

E(z nl Z n-l> 


f (ft , |z _ ) 
n-1 1 n-1 


Repeating the above procedure, it is easy to show that 


n f(z.|ft. ,Z. ,) f (5 . |fi. ,,Z. ,) 

£(a JV - f( 5 o> & - 1 - 


^( z iT z i-i» 


f(5 ) is the a priori probability density function of the 
initial value of the noise covariance parameters. It is easy 
to show that 


f ( z . ft . , Z . . ) 

1 i' l-l 


W^lE.I 1 ' 2 


4 (z i- K i z iii-i )TB l 1(z i- H Aii-i ) 


(7.2.12) 


where 


B. = R . +H.P.I. .hT 

ill li-l l 


/\ 

x. i . is the maximum likelihood estimate of x. after i-1 
i | l-l i 

measurements using the true values of ft^ to compute the proper 
filter gains, and P^ j is the conditional covariance of x^ 

A 

about x . | . . . 

i | l-l 

From the model of 5^, it can be seen that f | ft^_^, Z^_^) 
is a Gamma probability density function with conditional mean 


e (€. 


fi i-l ,z i-l ) 


*(1,1-1) + U i 


where u. is the mean of the distribution of noise covariance 

l 
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parameter driving noise. The conditional covariance of the 
distribution is 


cov(£.|ft. , ,Z. ,) = cov(u.) 

1 1 i-l' l-l 1 

In obtaining these expressions, use was made of the fact 

that u^ is independent of Z^_^ and ft^_^. f(z^|Z^_^) nee< ^ 

not be evaluated since it is not a function of x_ or £2 . 

n n 

The logarithm of the likelihood function (7.2.9) is 


L (ft ,x ,Z ) = In f (x I ft ,Z ) 
n n' n n n 1 n' n 


+ In f (ft | Z ) 
n 1 n 


(7.2.13) 


and the gradient of L n with respect to the parameters to be 
estimated is 


3L 3 ln f (x ft , Z ) 
n n n n 


3 ln f(ft Z ) 
n 1 n 


3a 


n 


3a 


n 


3a 


n 


(7.2.14) 


where now a T = (x T , ft’*'). The maximum likelihood estimate of 
n n n 

a is the value of a which makes all components of (7.2.14) 
n n 

zero. From an examination of (7.2.14), it can be seen that 

the estimate of the state x is just the maximum likelihood 

n 

state estimate which uses the estimates of ft^ to compute the 
proper filter gains. The estimates of ft n are found from the 
solution of the likelihood equations associated with the 
gradient of the likelihood function with respect to ft n * 

It should be noted that finding the necessary density 
functions in this likelihood equation is considerably easier 
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than finding the density functions in the previous likelihood 
equations (7.2.6). However, it should also be noted that the 
number of likelihood equations that must be simultaneously 
solved is much larger than in the previous case. In addition 
to the likelihood equation associated with the state, there 
is one likelihood equation associated with the value of at 
every measurement time . Thus as n becomes large, the number 
of likelihood equations also becomes large. 

A third possibility for likelihood function would be 


l(x , Z ) = f (xlz ) 
n n n 1 n 


(7.2.15) 


In this case, only the state is to be estimated, not the 
values of the noise covariance parameters. However, it will 
be shown that finding the above probability density function 
is even more difficult than in the previous two cases. From 
Bayes' rule. 


f (x „ Z J 
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Then the gradient of the logarithm of the likelihood function 
(7.2.15) is 
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3 In f (x Z ) 
n n 


dx 


n 


oo 

f(x 1 |Z ) J'j 

n 1 n J J 


3f (x | Z ,S2 ) 
n ' n n 


n 1 n 
9x 

n 


f(fl |z ) dft 

n 1 n n 


Evaluating this expression in a realistic situation would 
be very complicated and finding the zero points of the equa- 
tion would be even more involved. Thus the number of 
equations that must be solved has been reduced over that of 
the two previous approaches, but the complexity of the 
equations is considerably increased. 

In general, the state estimates obtained from the 
solution of these three different likelihood functions will 
be different. Which estimate is "better" depends upon what 
information is desired from the measurement information. 

Solution of the first problem will result in estimates of 
the current state and the current value of the noise covari- 
ance parameters. The solution of the second problem will 
result in estimates of the current state and the values of 
the noise covariances at all times. The solution of the third 
problem will result only in the estimate of the state, with 
no information provided about the value of the noise statistics. 

The tradeoff between the number of equations to be solved 
and their complexity seems to be a general feature of maximum 
likelihood estimation. As the number of parameters to be 
estimated increases, there are more equations that must be 
simultaneously solved, but it is usually easier to find the 
necessary probability density functions. 

Maximum likelihood estimation can be used to estimate 
more general parameters of the system and measurement than 
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the statistics of the noises. These problems can be handled 
within the framework of the maximum likelihood estimators 
already developed. The likelihood functions of Chapter 3 
were written as functions of the state x n , the measurements 
Z n , and the parameters R and Q. In fact, the likelihood 
function is a function of all parameters of the system, 
namely the state transition matrix, forcing function matrix, 
and the observation matrix. The dependency of the likeli- 
hood function on these additional parameters was not indicated 
because it was previously assumed that the parameters were 
known precisely. Now it is assumed that some of these para- 
meters are not known precisely a priori, but rather knowledge 
of them is described by some a priori probability density 
function in a fashion similar to that used in describing the 
uncertainty in R and Q. 

Let 3 represent the vector of any additional parameters 
of the problem that are to be estimated. For simplicity it 
is assumed that 6 is time invariant. The likelihood function 
appropriate for this problem is 

l(R,Q,x n ,Z n 3) = f (R,Q,x n ,3|Z n ) (7.2.16) 

where f (R,Q,x n , 3 | Z n ) is the joint conditional probability 

density function of the parameters R,Q,x n , and 3 given the 

measurements Z . From Bayes ' rule 
n 

f (R,Q,x n ,B|Z n ) = f (x n |Z n ,R,Q,B) f(R,Q,3|Z n ) (7.2.17) 
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Then the logarithm of the likelihood function is 


L n (R,Q,x n ,B,Z n ) = In f (R,Q,x n ,B |Z n ) 


(7.2.18) 


= In f (x |z ,R,Q,6) + In f (R,Q,B Z ) 
n n n 


Then 


3L n 3 In f (x n | Z n ,R,Q,3) 
9x ~ 9x 


(7.2.19) 


n 


n 


But the zeros of the likelihood equations (7.2.19) can be 
shown to occur when 


X -* X I (R ,Q , e , Z ) 
n n n n n n n 


This says that the estimate of x n is just the maximum likeli- 
hood estimator of the state that uses estimates of R, Q, and 
8 to compute the proper filter gains. Estimates of R and Q 
are found in the same manner as in Chapter 3. Estimates of 
8 are found from the solution of the additional likelihood 
equations 


9L 

r 

96 


n 


= 0 


i x ->-x | 
n A n n 


R-+R. 

A 

Q+Q 

6+6 


>n 


n 


n 


The likelihood equations for the state x n , the noise covari- 
ance parameters R and Q, and the additional parameters 6 
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must be solved simultaneously, this generally requiring an 
iterative solution. 

Thus it can be seen that more general parameters can be 
estimated in the same way as the noise covariance parameters, 
except that for each additional parameter to be estimated, 
an additional likelihood equation must be solved. 

As in the case of time varying noise statistics, much 
work needs to be done concerning the asymptotic properties 
and possible convergence problems associated with the esti- 
mation of these additional parameters . 

One final word should be said about the application of 
maximum likelihood estimators of the state and noise statis- 
tics in problems when there may be errors in the dynamical 
model of the state. Jazwinski has shown that the effects of 
these modeling errors can often be characterized as an addi- 
tional noise driving the state, where the statistics of this 
noise are unknown. If a maximum likelihood estimator of the 
mean and covariance of the "effective driving noise" is 
employed, there is good reason to believe that the perfor- 
mance of the state estimator can be considerably improved. 

In such cases, the estimates of the statistics of the noise 
may have little physical significance, since there is actually 
no "modeling error noise" driving the state. However, if the 
effect of the modeling errors can be accurately represented 
as such a noise, then estimating the statistics of this noise 
can improve the state estimation and minimize possible diver- 
gence problems within the filter. 
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Appendix A 

MATRIX AND VECTOR OPERATIONS 


A. 1 The Generalized Inverse 

The generalized inverse is an important concept in 
matrix theory because it provides an extension of the concept 
of an inverse which applies to all matrices. Deutsch, Rao, 
and Rust discuss the theory and application of the generalized 
inverse in such problem areas as numerical analysis and 
least squares estimation. This appendix closely follows the 
work of Deutsch. 

The generalized inverse of an m x n matrix A of rank r 
is a n x m matrix A of rank r such that 

A A # A = A (A. 1.1) 

4 rn 4 t 4 

If A = 0, define 0=0. Both A A and A A are idempotent 
because they are equal to their squares. 

(A # A) 2 = A # A A # A = A # A 
(A A # ) 2 = A A # A A # = A A # 


If A is of rank r > 0, then it has a rank factorization 
of the form 


A = B C 
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where B is a m x r matrix and C is an r x n matrix with the 
rank of both B and C equal to r. 

The pseudoinverse of a matrix, often called the Moore- 
Penrose generalized inverse, is defined as 

A + = C T (C C T ) -1 (B T B) _1 B T (A. 1.2) 

with 0 + = 0 T 


A pseudoinverse is a generalized inverse because (A. 1.2) can 
be shown to satisfy (A. 1.1). If A is nonsingular, then 
A + = A # = A _1 . 

There are several advantages for employing the pseudo- 
inverse rather than the more inclusive generalized inverse. 
These stem from the following properties: 

1) The pseudoinverse of a pseudoinverse yields the 
original matrix. That is (A + ) + = A 

2) (A A + ) and (A + A) are symmetric matrices. 

3) The pseudoinverse of a matrix is unique. 

Rust discusses an algorithm suitable for digital computer 
operation for finding the generalized inverse of a matrix. 
However, in certain special cases, the solution can be obtained 
directly . 

If (A T A) is of full rank then 


* + ,,T„.-1-T 

A = (A A) A 
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T 

If (A A ) is of full rank then 


,.+ „T /7 . ,T x -1 

A = A (A A ) 


A. 2 The Matrix Inversion Leinma 

If A^ is a n x n nonsingular matrix, A 2 is a n x m 
matrix, A^ is a m x m nonsingular matrix, and A^ is a 
m x n matrix, then 

(A 1 + A 2 A 3 A 4 ) _1 = A " 1 - A" 1 A 2 (A 4 A^ 1 A 2 + A " 1 ) -1 A 4 A ~ 1 

The proof is by direct substitution. 

A. 3 Matrix and Vector Derivatives 

Certain matrix and vector identities are needed in the 
main text. The purpose of this appendix is to derive the 
general results applied there. The following notation is used 
here : 

x a n x 1 column vector 

L(x) a scalar function of the vector x and possibly 
other parameters 
Y a m x m matrix 
Y ^ the inverse of Y 
| Y | the determinant of Y 

“1 T 

U the cofactor matrix of Y such that Y = U / | Y | , 

T 

where U is the transpose of U. 
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I 


a 1 x n row vector 


3L(x) f3L 3L 3L_1 

""" 3x l 3x7' 3x„ ' ' * * ' Jx~ J 

12 n 


3 2 L 3 2 L 3 2 L 

3x, 3x, 3x„3x, *** 3x 3x, 

112 1 n l 

3 2 L(x) 

3x3x 

• 

3 2 L 3 2 L 3 2 L 

3x, 3x 3x 0 3x^ * * * 3x 3x 
1 n 2 n n n 


a n x n symmetric 
matrix 


1.- 


i ill! = 

JYJ 3Y 


(W 


Proof: By the cofactor expansion of the determinant of Y, 

m 



i*i - 1 

Y.. U., 

lk lk 

for any 

i 


k=l 




Then 

m 

3 1 vl 

3Y ik „ .v 

3U . , 
lk 


$ 

II 

* in. 

ii r- 

H 

3Y . . U ik Y ik 

3 * 



By definition 

u.i 

xk 

= (-D i+k M. k 



where M^ k is 

the minor of Y^ k which is 

fcrtmd by 

evaluating 


the determinant of the matrix obtained by deleting the row 
and column containing the element Y^. Thus from this 


definition, U^ k is not a function of Y^ k and 


9U ik 

9Y ik 


0 


317 



Mil I 


I n i ill 


m 


So 


illi 


■ i 


k=l 


9 Y . , 

— u 

3Y j( , ik 


m 


■ z 


k=l 


5 ij 6 kJl U ik U j£ 


where is the Kronecker delta defined by 


6 . . = 0 
il 


i ¥ j 


= 1 


i = 1 


Then 


1 9 I Y 1 = 1 


T^T 3Y jJt 


U Jn = 


TTT U j£ 


(Y _1 ) 


Zj 


Or 


1 1M 

JT[ 9 Y 


(Y X ) T 


2. By entirely analogous procedures it can be shown that 


1 8|Y+B| = [ (Y+B) -1 ] ^ 


Y+B 9Y 


where B is any matrix that is not a function of Y. 


3. Let Y be a function of a matrix Z. Then 


1 3_J_yJ_ . 

TyT 3z jJt 


^ r" 1 3 |y| 3Y ik 

Z_. ■— TyT 3y.. 9z . „ 

i k lk =>* 


r- \ — , 9 Y . . 

= \ > (y 1 ) 3 - k 

/_ u ' ki 9 Z . 


i k 


j£ 


= Tr (Y 1 g|^-) 
3Z j£ 


where Tr ( ) is the trace of the enclosed matrix. 
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4 . By analogous procedures it can be shown that 


1 3 1 Y+B | 




where both Y and B may be functions of Z. 


5. Let a and b be any constant m x 1 column vectors. Then 


3 (a T Y b) 

3 V 


-II a i 

i k - 1 i k 


b k 6 kt s ij 


" a A 


Or 


3 (a T Y b) 
3 Y 


= a b' 


6. From the fact that Y Y ^ = I, it can be shown that 


3Y Y -1 + y gl = 0 


3Z jil 


3Z j* 


Therefore 


3 Y 


-1 


9Z jS, 


_ Y -i _£L_ Y -i 

9 V 


7. 


3 (a T Y 1 b) _ 


3 Z 


j* 


_ a T Y -1 ”L_ Y -l b 
32 


If Y = A Z B + C, where A, B, and C are constant matrices. 


3 (a T Y 1 b) 
9Z j£ 


- a T Y 1 A * 12 — B Y 1 b 

3Z j£ 
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3Z .. 




i k 


j£ 


1 1 < AV1Ta >i s ii « 

i k 


- 1 , 


ij { M (B Y b > 


(A T Y‘ 1T a b T y- 1t b t ) . , 


Or 


T -1 
8 ( a Y X b) 

dZ 


- A T Y- 1T a bV 1 T B T 
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Appendix B 

EVALUATION OF EXPLICIT ESTIMATOR MEAN SQUARED ERROR 


In Section 4.4 expressions for explicit estimators of 
the diagonal elements of the measurement and driving noise 
covariance matrices were developed. Also evaluated were the 
conditional and unconditional means of the estimates and the 
conditional mean of the squared estimation error. In this 
appendix the unconditional mean of the squared estimation 
error is obtained. 

From (4.4.16) the conditional mean of the squared R 
estimation error is 


[(R^ j - R jj ) 2 ] = G^ j + (F^ j ) 2 (B.l) 


where G 


33 = 
'n 


G JJ . + ~ ( (R + AF 
n n-1 2 n 

n 


n 


n 


-ki 


AF, 


H P* , H T ) 2 (B . 2 ) 
n n|n n 


(B. 3) 


AF = 
n 


k=l 

d k (P k|k ' "klk'^k “k"k x 


HjPuv + ~ H, A^R 


<j> 

RA, H. 
k k 


(B . 4 ) 


The unconditional mean of the squared estimation error 
is then 


E[(R^ j - R jj ) 2 ] • E(G^ j ) + E[(F^ j ) 2 ] (B.5) 

n n n 
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But E (G 


”> - E(G n-l> + 4 E[((R + AF n - H n P n|n H n >jJ)2 > 


Define 


<U - ^ a n P n | n H n^ = 


• t h T 

where h D is the j column of H . 

n n 


Then E[((R + 4F n - H n P*, n H P ) jj ) 2 ) = (R jj ) 2 + EKAf’V] (B.6) 


+ (a n]n )2 + 2 ' 2 "^n^nfn ' 2 ^ a n|n 


where 


- e(R^) and (R^) 2 = E[(R^) 2 ] 


It is assumed that the a priori values of R and Q used 
to compute all starred quantities are equal to the means of 
their respective distributions. Or 


R = R and Q = Q 
o o 


Then using the results of Section 2.3 it can be shown that 


P n|n E ^ P n|n^ P n|n 


and also 


E ( AF ) = 0 
n 


Define 


at = (H P , H T ) jj = h JT P , h 11 
nn v nnnn n nnn 


Then 


AF 113 = a- 
n 


-1 *-i J 

. -fa*? - 2 a t 

n | n n | n n ] n — r^- 

■D J J 
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and E[(AF jj ) 2 ] 
n 


*i \ 2 


= E( [a 3 I ) 2 ] - (a ] ) 

n | n n | n 


* “i 

4 a nfn 


R 


DD 


E(a’, R«> - a*5 R” - E” a n]n 
n I n n | n *^ R - - 1 — 


57 


. jk 


J • • ■ • A 

where 5" = E[(R- ,:I - R 3 - 1 ) ^ ] 6 . , a diagonal matrix 

R 3K 


Define 


Y 3 | = E (P ■ R 113 ) - P | I 

n|n n| n n| n 


> 11 


d^ = h j V, ^ 

n n n n n 


Then E[(AF^ j ) 2 ] = E(a ^,) 2 - 




(a*j ) 2 -[iialil a 3 - 
n | n I — r-r n 


R 


33 




jj a 7 
n 1 n 


R R 3 3 


(B . 7 ) 


i j | ) 2 ] 

n | n 

(2.3.44) it can be shown that 


E[(a^,) J and must now be computed. Using (2.3.43) and 


n 


n n 


T \ * *T T T. T 

X , P , x 1 , + ) X |, (A.RA. + D , r.QrX X | V 

n|oo|o n | o n|k k k kk kk n|k 


k=l 


where 


D k = (I - W 


X n\k = D n $ (n-1^-2) . . .D k+1 <Mk+l,k) 


with X k|k 8 I 
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n 


Then h jT P | hj = h jT X , P , xJJ. hjj + Y~ lvj T X I I k h n 

n n|nn n n|oo|on|on n n | k k k n|kn 

k=l 

n 

+ Y h-^ T X |, D, r v Qr^A T |, hi 
L-j n n|k k k n|k n 

k=l 

, *T. T , j.£ ,*T,T „t. £ j 

<A k X n|kV = (A k X n|kV 


But 


, T T,T . j.£ _ , T T,T „T.£j 

(r k D k X n|k h n ) " (r k D k A n|kV 


So 


h n Tx n|k« Tx n|k h n ’ (H - X - ' J ° < R >°" <*.^1-0 


* js , r) s£ *T, T| T^£j 

n“n|k~k' (R) A k A n|k H n‘ 


Similarly 


h2 T A_i, d. r. = ( (r'D'x;,, O J ~) fc (q) 


- ( (H X ..A*) 3 *) 2 (R) ££ 
v n n [k k 


T n T , T . vj = ,,r T n T i T . ^ ^ 2 mi** 


n n k k k k k n k n 


k~k n k n' 


n 


Define ^ C n|l^ £ = ^ ^ H n A n I k A k^ ^ * 2 a y x y matrix 


k=l 

n 


(L = Y ( (H X i.D.r. ) ^ £ ) 2 a y x n matrix 

n 1 Z_ nnkkk ' 


k=l 


e J i = h JJ -A i P , X T , 


3 I = h^ T ? 
no n n I o o I o' n I o n 


a scalar 


r 11 = (R) 311 


a y x 1 vector 


I 3 - (Q) DJ 


a n x 1 vector 


Then 


h^ T P 


n n n n 


h^ = e 3 


. + (C" It r + L |, 

no n 1 n 1 


q) 


(B . 8) 


324 



Squaring (B.8), performing the unconditional expected value, 

★ -i 

then subtracting the square of aV n , 


E!(a n|n>"> - = < c i|l I R C n|l + L „|l E Q ^|l> 33 

where Y = E [ (Q^ - Q-^)^]6 a diagonal matrix 

n 


In obtaining this expression it was assumed that R and Q are 

independent random variables and that e * 1 1 is not a function 

n I o 

of R or Q. 

By a similar procedure it is easy to show that 


4 - (c; u ) jj E 


DU 


Define 


(C ' i \ fc" , _ 2 

1 n 1 ; 'Si 1* 


n 1 n 


R 


DD 


6 

Dk 


Then (B.7) becomes 


E[(4p3J) 2 ] = tci u E c n|i + L n 1 1 E l> 

k y 


,T 


T . \ D j 


It can also be shown that 


ECR^AF^ 11 ) = (C^)” Z 


DD 


DD 
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So, after algebraic manipulation, (B.6) becomes 


E[ ( (R + AF - 


:Wl = (R jj - a*} ) 2 + 


n 


H n P nln H n> ) 1 = (R 

n n n n 


a •? ) 2 + (c I , Y C T | , 
n|n n|l ^- R n|l 


+ t V t t 

+ L n| 1 ^ L n| 1 


where 


C | , = C ' | , + I 
n 1 n 1 


So E (G'' ^ ) = 
n 


n-1 


\ n / 


E(G n-l> + 4 [(Rjj - a n]n )2 +(C n|l C n|l 


+ L .. r 

n I 1 q n | 1 


(B.9) 


Evaluation of EfCF^- 1 ) 2 ] is considerably more difficult 

than evaluation of E(G-^) but uses a similar procedure. It 

n 

can be seen that 


f” = ^ F j:i , + i 4F” 
n n n-1 n n 


n-1 


so (F^ ) 2 = 
n 


— - (F^ t ) 2 + ~~o (AF ^) 2 + ■— Y AF^ AF^ 

n n-1 2 v n 2 L_ k n 

> n n . i 


k=l 


After a slight rearranging of terms and performing the above 
sum to n instead of n-1, it can be seen that 


E[(F jj ) 2 ] = 

n n 


n 


n 

2 Y E(AF^AF^) 
k=l 


E[ (AF^ j ) 2 ] 
(B.10) 
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After algebraic manipulation, 


EfAF^AF^) 
k n 


E(a klk ' 

a n|n> ' a k]k 



2 a kfk 

[-J - r 33 - 


2 a 


|njn 


p3D 


|3 - 


n 

Following the same procedure as in finding E[{iF^)^3 , 

Q n|1 

n 

i ^ — > 

Define 


: k| 1 

^ c A|i 


n 

- 1 V 

n | 1 

" n L 
k=l 

r” 

n 

- 1 T 

J_i | -> 

n | 1 

",n Z_ 
k=l 

manipulation 


r-” * 
L ~ 
R R 


*j - 

k[k 

j j 


Then after algebraic manipulation, (B.10) becomes 


E[ (F^ j ) 2 ] = 


n-1 

n 


EKF^i) 2 ] + 4l(2 n C',, [ c;*! - C',, [ 

n k k 


« 2 "-„u E ^,,-Lni, r Q ^u» jj ) 


(Be 11) 


C" . , and L , , can be computed through a recursive rela- 
n 1 1 n | jl 

tionship. From the definition of 

<c ;|l )j *=E <‘Vn|k A k > 3,1)2 - t (<h n Tx n|kV V 


k=l 


n 


k=l 


V /u jT. * £ , *T.T . jv£ 
" L ( n X n|k\> (A k X n|k h £> 


k=l 
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Define 


n 


7v*\ m ^ , *1\T \ /uj\ U 

= h n L (A n|kV (A k A n|k } (h n ) 


k=l 


n 


(a 


.mlu V' / *v a *\m£ t** T \ T \^ n 

n " L W n|kV (A k X n|k> 


k=l 


Then 


= (hi)” (a n )” tu (hi) u 


(B. 12 ) 


But (a n ) 


n -1 

m£u ^ n * . m-2- *T,T . £u ,*^1 , *T. £u 
L A n | k A k (A k n | k + (A n ) (A n > 

k=l 


and 


X n|k = D n »(n-h-l> x n -i 


so (a )” tu = (D *(n,n-l))““ (a ,)"*'■ W*(n,n-l)Di) 
n n n~x u 


ms , \ s£t ,,T 


.T . tu 


+ (A*) m£ (A* T ) £U 


Therefore, (a n ) m£u can be computed recursively and C^^found 
from (B. 12 ). L n | i can ke computed in a similar fashion. 
Define 


n 


«y 


m£u 


- I 


k=l 


(\ nr ) ( r ^ 
(A n|k D k r k ) K D kn | k 


Then 


(L ,,) j£ = (h-j) m (6 ) m£u (h^) u 
n 1 n n n 


(B . 13 ) 
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and 


( 3 n ) m£u = (D n $ (n ,n-l) ) ms (B n _ 1 ) S£t ($ T (n,n-l) D^) tu 


+ (d r ) m£ (r T D T ) £u 

n n' n n 


From (4.4.21) the conditional mean of the squared Q 
estimation error is 


e[(Q^ - Q jj ) 2 ] = + (M j;5 ) 2 

n n n 


(B . 14 ) 


where J 


j j _ n-1 

n n 


n-l 


2 

n 


J;* j , + -^ ( (Q + AM 


n 


T*) 2 

n 


(B . 15 ) 


n 


# * * &T 

r ff (p , - u ) r ff 

n n I n rr n 


U n = $(n / n-l)P n _ l|n _ 1 $ (n,n-l) 


rj is the generalized inverse of r , which in 
most cases is equal to 


(r T r ) 1 r T 

n n' n 


AM = 
n 


„# , * * *_1 

r (p | + p | - p i p i . p i 

n nn nn n n n n-l n n-l 


<fe ^ 

P n|n-1 P n|n-1 P n I n 


* 4m 

+ u - u )r ff 

n n n 
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U n = * (n,n-l)P n _ 1 | n _ 1 ® (n,n-l) 


n 


m = i y 

n n / 


AM, 


k=l 


The unconditional mean of the squared estimation error 
is then 

E[(Q^ j - Q j j ) 2 ] = E(jj j ) + E[(M^ j ) 2 ] 


But E(J^ j ) = (~i) ECT^) + -^-E[((Q + AM n - T*) jj ) 2 ] (B.16) 

\ I n 


E [ ( (Q 


+ AM - T*) jj ) 2 ] = (Q jj ) 2 + E[(AM=j j ) 2 ] + (T* jj ) 2 (B.17) 
n n n n 


+ 2 E(Q^ AM^) - 2 E(AM^) T*^ - 2 T*^ 


If R = R and Q = Q, then E (AM ) = 0 and 
o o n 


E [ ( (Q + AM n - T*) jj ) 2 ] = (Q^) 2 + E[(AM^ j ) 2 ] + (T* jj ) 2 


+ 2 E(Q^ AM^) - 2 T*^ 


After some manipulation, AM n can be expressed in the following 
form. 


AM„ = Q - Q + f n [(R - R> + H n (P n | n . 1 -P; |n . 1 )H^]f^ 
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where 


a n x y matrix 


# * 

f = r ff A 
n n n 


Define = the j th column of f^ 


gJ 4 h T fJ 
n n 


b n|n-l = 9n Tp n|n-l g n 


b n?n-l * 


m- 1 = 
n 


(<f j) 1 ) 2 "! 

n i 


; ( (f^) 2 ) 2 i 


;((fj ) Y ) 2 
L 


Then AM^ = - O- 1 ^ + m- )T 

n 


Q JJ + m J (r - r) 
* n 


a y x 1 vector 


a 6 x 1 vector 


a scalar 


a scalar 


a y x 1 vector 


+ b n|n-l ~ b n|n-l 


where as before r and r are y x 1 vectors composed of the 
diagonal elements of R and R respectively. Then, after some 
algebraic manipulation, it can be shown that 


E[(AM^) 2 ] = Z + m n T £ p m n + Et(b n|n-1 )2] " {b njn-l ) 

Q R 

-2[E(Q j JbJ | n.i)-Q Jjb *fn-l' + b nfn-l' 
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Now E C ( b n | n _i) 2 1 ' E (Q^b^ | n-1^ and E b n|n-l^ must be found. 
From (2.3.43) 


qi * *rp 

Pi = D P | .D + A RA 
n n n n n-1 n n n 


so 


-1 * *T T-l 

P I , = D X (P | - A RA )D 

n n-1 n n n n n n 


and 


b^i 1 = g^ T D 1 (P . - A*RA* T )D T ” 1 g^ 

n n-1 n n n n n n 


Or | . = g^D -1 A . P . X T > D T 1 g^ 

n n-1 ^n n nooonon ^n 


n 


+ Z4X\|k<*iX T + D k r k Qr k D k» X n I k D n'M 


k=l 


- g^ T D- 1 A*RA* T D T - 1 g^ 
r n n n n n ^n 


n 


Define ^ |k\» *» 2 ‘ < *> 


k=l 

n 


(w Aii ,3t = T. KV'nikVy*’ 


k=l 


B*. = gJV 1 * , P . X T , D T 'V 

n o y n n n o o o n o n y n 


Then 


b^i t=s^| +(U" i,r+W , i,q)^ 

n n-1 n o v n 1 n 1 
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From this it follows that 


E[<b n|n-l ):! l - ^n-l'* = <“A|1 E “A|l + W A|l E 


In a similar fashion it can be shown that 


E< °” b il|n-1> - Qjjb ^n-1 - < W A|1> 


3 3 \ 


_ 33 


and = ( E r “A|l' k3 


So EKAM^) 2 ] = E” + E A * <°Au E “All + W A 

U R K 


- 2 W 


kli r; + 2 K* r “All' 


.33 


j T r n-T aj 


Define 


W , . « W' | , - I 
n 1 n 1 


( “An' Jk = 


,,)J k + (m j ) k 

n 1' n' 


Then 


E [ (AM^ ) 


- (u Ai 


Z u 


,T 

nil 


+ W 


n 1 


<T 

L 


W 


n I 1 J 


3 3 


In a similar fashion it can be shown that 



SO E[((Q + AM n -^)^) 2 J = (Q jj V 3j ) 2 + (Dili E R UA|'i +w A|i^ Q w Au )jj 


and E(jjj j ) = £j^ 

n n 


E(J n-l> + 


R 


Using a procedure similar to that of finding E[(F^) 2 ], 
it can be shown that 
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